Borje Langefors 


Theoretical 
Analysis 
of Information 
Systems 


SECOND VOLUME 


STUDENTLITTERATUR 

LUND 

AKADEMISK FORLAG 
K0BENHAVN 
UNIVERSITETSFORLAGET 
OSLO, BERGEN 



©Studentlitteratur 1966 
Borje Langefors 

Second edition 
Second printing 
Studentlitteratur 
Lund, Sweden 1968 


CONTENTS 


FIRST VOLUME 


Chapter 1 ; , Basic Problems of Systems Theory 


Page 


11.1 

11.2 

11.3 

11.4 
11. 5 
11.6 
11. 7 
11.8 

11.9 

11.10 
11.11 
11.12 

11.13 

11.14 

11.15 

11.16 


Needs for a Formal Systems Theory 
Common Faults in Systems Design and Analysis 
Different Kinds of Systems Study 
Systems Engineering 

Structural Systems Theory, Electric Networks and Elastic Systems 

Mathematical Systems Theory as a System 

Other Kinds of Systems Study 

Elements of a Systems Theory 

The Systems Analysis Approach 

The Fundamental Principle of Systems Work 

General and Special Properties of Systems Problems 

System, Subsystem, Parts and Boundaries 

Structure Types of Systems 

System Partitioning 

System Partitioning of Outer Boundary 

A Schetch of a Basic Theory of Systems Analysis 


11 

14 

15 

16 
19 
23 

27 

28 
31 
33 
35 
38 
42 
45 

47 

48 


Chapter 2. Systems Algebra 


12.1 

12.2 

12.3 

12.4 

12.5 

12.6 
12. 7 
12.8 

12.9 

12.10 
12.11 
12.12 

12.13 

12.14 

12.15 

12.16 

12.17 

12.18 
12.19 
12. 20 

12.21 


Algebraic Tools for Describing Systems 
Precedence Operator for a System 

Use of the Precedence Matrix P to determine the Precedents 

Matrix by Matrix Composition. P n , n-th Precedence and Paths 

Succedence Matrix P-p 

Generalization of the Precedence Concept 

A ^Generalized Matrix by Vector Operation 

P , the 1-dimensional Precedence Matrix 

P 01 and plO 

Relations between P°l, and P 11 and p0° 

Definition of E 10 , the Incidence Matrix 
Built-up Systems and Gross Systems 
System Connections, Boundary Operation and Cycles 
Definition of M, the Coincidence Matrix 
Positional Operator for The System Graph 
Simple Paths and Closed Paths in a System Graph 
Transposed Positional Operator, Forward Positioning 
General Positioning 

Requirements Computation and Scheduling 

Determining the Boundary Operator from M and the Part Bounderv 
Operator R x J 

Boundary Operator for Generalized Systems 


65 

67 

71 

74 

76 

79 

82 

90 

96 

98 

99 
99 

106 

107 

108 
111 
114 
116 
117 

133 

134 


HON, LitM/iiti 

CAM£61£-M£LL0U liNIVERSirif 



Part 2, Information Systems Theory 


Chapter 1. Information Systems 

21.1 Information System Design 143 

21.2 Formalization of Information System Design 144 

21.3 Component Problems of Information Systems 145 


Chapter 2. The Function of an Information System 

22.1 The Function of an Information System 149 

22.2 Two tasks of an Information System I 50 

22.3 Operative Information Requirements. An Example. 151 

22.4 The value of Directive Information 155 

22.5 Effect of Time for Decision - Making 159 

22.6 Transient Decision Situation. Satisficing. 162 

22.7 Information needed in a simplified model of a manufacturing shop 163 


Chapter 3. The Economic Quantity of Information 

23.1 The Economic Quantity of Information I 73 

23.2 Information Value as an Information System Design Parameter 175 

23.3 Information and System Control I 79 

23.4 The Meaning of Information within a System 181 

23.5 The Value of Information in a System l 84 

23.6 Data Representation of Information in a System Volume of data 193 

23.7 The Information System for a Simple Inventory 198 

23.8 Operative versus Directive Information 2 °1 

23.9 An Example of Optimum Reduction of Information Processing for 

a Simple Inventory 204 

23.10 Information System for a Simple Work Station 207 


Chapter 4. Some Problems of Information Systems 

24.1 Complexity of an Information System 213 

SECOND VOLUME 

Chapter 5. Precedence Relations Between Information Sets in an Information System 

25.1 Data Structure of an Information System 223 

25.2 On the Definition of Elementary Files 238 

25.3 Inference Problems in Information System Design 240 

25.4 A further Illustration to Information Precedence and Elementary File 

Definition; Computation of Weekly Salary 242 



Cost Distribution of Job Costs as Another Illustration of Discussing 
Elementary Files 

Identification of Precedence Information 

Use of the Information Precedence Matrix P 00 for Compatibility Checking 
Some Other uses of the Precedence Matrix P. 


Chapter 6, Data and Information Files. 

26.1 Data and Information Files 

26.2 Size of Data Terms and Precision Required 

Chapter 7. Files. Computations and Processes 

27.1 Files and Processes 

27.2 The Size of a File 

27.3 File Volume and Transport Volume. Processing Period 

27.4 Transport Factor 

27. 5 Topological Transport Factor 

27.6 Grouping of Computations into One Process 

27. 7 Incidence Matrix of Process and Transput 


Effect of Process Grouping on the Transport Factor 
Memory Requirement associated with Process Grouping 
Computer Programs and Memory Space for a Process 
Example of Process Grouping with Memory Limitation 


Chapter 9. 


>ut Equipment Consic 


29.1 Reducing the Number of Transput Equipment 

29.2 Adaptation to Hardware System 


Information System Design Computations 

Joining Rows in E^ to Represent Grouping of Processes 

Representing Process Grouping by a Generalized Matrix Operation 

Matrix Operations to Compute File Transport Reduction and Memory Space 

Calculations for Minimum File Transport Design 

Procedures for Aiding the Intuitive System Design Phases 

Defining File Consolidation by Matrix Operation on E^ 

Influence on Programming Language Development 



Chapter 11. File Storage Considerations 

211.1 Files in Systems Using Mass Memories of Pseudo Random Access Type 357 

211.2 Direct Processing versus Batched Processing 359 

Chapter 12. File Organization 

212.1 Record Lay-Outs 363 

212.2 Record Organization 364 

Chapter 13, System Reliability 

213.1 Reliability of an Information System 369 

213.2 Means for Checking Input Data 37 0 

Chapter 14, The Problem of Optimum Grouping of Information Processes. 

214.1 The Problem of Optimum Grouping of Information Processes 373 

214.2 Special Case:Grouping Processes in Pairs 374 

214. 3 The Problem of Optimum Pairing without Memory Constraint 375 

Part 3, Some Data Processing Problems 

Chapter 1. Relation between a Process and its Files 

31.1 Relation between a Process and its Files 381 

31.2 Some Basic Problems of File Processing 382 

31.3 K-Progressive Process 385 

31.4 Rectangular File Processing and Group Access 387 

31. 5 Retrieval of File Records for a Process 389 

Chapter 2. Influence of Data Structure 

32.1 Influence of Word Length on Tape Recording Speed 393 


References 


397 






Part 2. 






1 . 


Data Structure of an Information System, 


One problem of organizing data is concerned with the data transport needed to 
bring data from the storage place to the processor. Another is to organize data 
in a way that is meaningful for people. x ) In the context of the first of these 
problems it seems appropriate to consider such data as are stored in the main 
memory - or any memory in which they are available in one basic word-time 
- of the processor to be used for processing, as being available without trans¬ 
port work. The problem of data organization then is how to store data in mass 
storage so as to minimize the work needed to transport these data to the main 
store when needed or to make them available within the time required. 

1 In business information systems it is common that data can be retrieved from 
files during a simple scanning of the file (and the batch of transactions) aiter 
sorting of transactions. When this is not possible random access is often said 
to be required. Contrary to this, the situation in engineering data processing 
is often one where more complicated magnetic tape handling is required. This 
has also been found to be the case in many applications such as production 
scheduling. 

Insofar as the so-called random access memories require a general access 
time that is much longer than the word time of the processor while having smal¬ 
ler access time to data which are placed in neighbouring positions than to those 
in other positions, the data transport minimization is not a problem only for 
so-called serial access memories. In fact these two classes of memories are 
merely different instances of one and the same kind of pseudo-random access 
systems (as was pointed out by the author) 1 2 3 ). It will be shown later how many 
data handling routines which cannot be run as a simple scanning of the master 
file involved, can be classified as a more general handling type called rectangu¬ 
lar handling. Obviously the type of handling of files required must be considered 
when searching for minimum transport system solutions. Likewise, it is obvious 
that the processing period, that is the time interval between two processing 
runs of the same file, is of importance for the transport work. 

2 Analysis of the information needs of the firm. 

We make the basic assumption about the firm that a set of "functions" can be 
defined which must be in action in order that the firm shall be able to fulfil its 
objectives. In fact it seems probable that the only way to define the objectives 
of a firm is to define its "functions", 

Our approach to the system analysis will be to start by defining the basic func¬ 
tions of the firm and then go about to find the relation of each function to the set 
of information classes available. 4 ) 


1) For instance: input has to be organized to system specification but also so 
that it is meaningful to the people gathering it before input. 

2) 3 La 1961-1. 

3) See 31. 6 

4) We refer to sections 22. 3 and 22.4 for a simple illustration. 
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3 The basic functions of the firm. 


We take as the basic functions of the firm such operations as the firm has to 
perform in order to fulfil its objectives, and also such other operations which 
are not directly concerned with the objectives but are indirectly necessary in 
order for the firm to be able to perform the directly necessary operations. 

Thus a firm which has as main objective to produce a set of objects has, of 
course, the directly necessary function of running the production. Another di¬ 
rectly necessary function is buying the raw materials out of which the products 
are made. Indirect functions are for example the paying of salaries to the 
employees or the cost accounting needed, for instance, to serve the directly 
necessary function of setting prices on products and controlling the efficiency 
of the firm. 


In defining the functions it is important to isolate each single function on the 
basis of its proper reasons for existence if possible without being too much 
influenced by its present organization. To take a simple example*. The function 
of paying salaries is concerned with the computation of the salary for each 
individual person at each salary period, that is to compute the amount payable 
after deductions, and to associate this amount with an identification of the per¬ 
son concerned. 


It is not within the scope of this function, however, to define which different 
products or customer orders are to be charged with the amount paid. The fact 
that this is normally computed in connection with payroll processing corresponds 
to a common solution to the problem of economic data transport and is not to be 
mistaken for an indication that it belongs to the function of salary pay. The same 
is true for many other statistical data produced from pay-roll information. 


4 The Information Needs of Firm Functions. 

For every function of the firm a set of different information classes, or 
elementary information sets has to be provided. This information will be needed 
either to monitor the performing of a function or as a basis for a decision which 
has to be made by a human in order to control a function. An example of the 
first kind is the salary pay where the computed amount per person can directly 
be used to trigger a pay action. Instead a function of sales forecasting will 
clearly be used only as one basis for human decision making. 

5 The first step in the analysis of information need is to define for each function 
of the firm the different classes of information needed. Further for each such 
class we also have to determine the requirement of information as a function of 
time. Thus for instance the salary pay function calls for the information speci¬ 
fying for each person concerned his identification and the amount to be paid to 
him. In addition it will also, in most cases, be required to list all deductions 
made for each person. 


1) See also sections 22.3 and 22.4 for simple illustrations. 
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This information will be required at each pay period, each week for instance. 

In this way for each function a set of required information sets is defined and for 
each of these another set of information sets required for its production. 

6 The definition of the basic functions does not seem to be computable by any 
routine procedure at present. Rather it has to be defined by careful analysis 
of the goals of the firm. This seems difficult and will certainly be so regarded 
by most firms. In fact it is believed to be in many cases an obstacle to taking 
an analysis of this kind as a first step to automatization of information process¬ 
ing. However, it should be obvious on a second thought that no firm can be assu¬ 
med to work in a rational way, if it has not, at least, clearly defined its diffe¬ 
rent functions and the information needed for each of them. This is not a problem 
that can be solved in one day or two. It is well worth much effort and even long 
research. This is to say that a firm which really knows its objectives and how 
to fulfill them should find no difficulty in defining the basic functions and its 
information requirements. In fact one of the most important functions of mana¬ 
gement should be to work continuously on the problem of defining goals and 
functions for the firm. 


Although no procedure for routine or automatic solution of this basic problem 
exists it is obvious that in principle this task can be made fairly easy by listing 
all the functions and their direct information needs of firms of elifferent types. 
Thus to specify these entities for any firm would mainly call for consulting 
such a list or set of lists for firms of the type in question and decide which 
functions to accept and which to omit, 

7 Such lists may not be available today. It will not be long after the introduction 
of an approach like ours, however, before they will become available and in a 
continuously improving set, 

8 In order to see how the information sets which are necessary or have potential 
value for the control of a managed system may be determined, together with 
their interrelations, we refer to an example used earlier, ' We give in fig. 1 

in a modified way a part of the simple system, containing only "store l n , "shop", 
and "sales” as firm functions. Of the candidates for information to be used for 
"refill order” we have assumed, for simplicity, that it has already been decided 
that "substitutable items" and "production related items" are not to be consider¬ 
ed in the study. . 

9 We may regard fig. 1 as a precedence graph. We then say that information which 
is to be used for a firm function is precedent information for that function. Li¬ 
kewise such information as is used to produce other information is said to be 

its precedents. Even our very simplified information system is seen to have a 
rather complicated structure. Precedence relation of fairly high order can be 
found in fig. 1. Indeed the information set ”8" (inventory status) is seen to be 
a virtually infinitely iterated precedent of itself. The same is true for "13" 

(rate of demand) and "14" (delivery lead time) as these hold information which 

1) As is illustrated in sections 22. 3 and 22.4 also an intuitive analysis is able 
to produce long lists of candidate information classes. 

2) Section 22.4. 
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is predicted by means of earlier values of the same kind. 


10 We may represent the information system with its precedence relations by the 
precedence matrix P (~ P°°). 

The fact that t! 8, 13 and 14” are precedents of themselves is indicated by a 
unit in the boxes (8,8) (13,13) (14,14) in p. 


226 
















11 We may modify the information system in such a way that the apparently in¬ 
finite iterations of "8, 13 and 14" are avoided. To do this we observe that the 
inventory status is changed during each of the elementary proc essgs. which take 
one of the information precedents of "8" as an input. These precedents there¬ 
fore also have to be used in a certain logical order so that each of them feeds 
into an elementary process which has the best new version of "8" as another 
input and produces a still newer (updated) version of "8". We denote the 
starting inventory status file by "8" and "update" it with 7 (if any) to produce 
the new version to be denoted "8 This in its turn is updated fy a second 

elementary process with "1" as another precedent to produce "8 " 'Then 
"5” is used to produce "8 " " M and finally a process which uses "4’’ as a 
precedent produces the "fully updated version" f, 8 '" (8 prime). "8 '" is then 
used as the starting version for the next processing run. In this way we obtain 
the system of fig. 2. 

For "13" and "14" similar reasoning holds. 
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The associated precedence matrix P is 
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12 Every column in P corresponds to an elementary process which produces the 
information set whose name (or identifying number) is attached to the column. 

By inspecting a column of P we see which information sets are its precedents 
and, therefore, are inputs to the process producing the information set asso¬ 
ciated with the column. By inspecting a row of P we find to which elementary 
processes its information set is an input (after having excluded "l", "2" and 
»3'» which are firm functions and no outputs of processes). We find that, for 

instance, ”8 "-" is an input to the processes producing respectively ’ *4" 

and ”8 " " and ’’ll” is an input to "9” and "10". We conclude that "8 " 

has to be input to two different elementary processes and, hence, that if the¬ 
se two processes could be grouped together into one composite process then 
Tt 8 - - - - 1 ! wou id only need to be input once. Thus one input operation would 
be discarded. Further it is seen that if ”9” and "10" could be grouped into one 
process one would save one input operation (out of two) for each of "11, 12, 

13 '" and "14 "". It is seen that these possibilities for saving input operations 
can all be detected by inspecting the rows of P. 

13 There is another possible saving of both input and output operations that is not 

so detected. It is instead detected still more easily. This is the fact that all 
the intermediary versions of "8", that is "8 ~ ", "8 ' * "" and "8 "might 
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all be eliminated if the processes corresponding to them could all be grouped 
together. Thus in this way three output operations and three input operations 
would be saved. This is found simply by inspecting the identifying numbers of 
the information sets, provided the identifiers associated with different versions 
of the same set are equal safe for a tag to identify the individual version. 

14 Whereas it is easy to see from the matrix P how input or output operations 

would be saved if certain processes were grouped together, it is not so direct¬ 
ly visible whether the grouping can actually be made. We find from fig. 2, that, 
for instance, the process producing cannot be grouped with that pro¬ 

ducing ”8 ' ”, for before "8 " ” can be produced we must produce ”4" and 

”8 " ' ' ” is a precedent of ”4”, that is 8 '' " < (4), We shall see later how 

this kind of condition can be checked also on the precedence matrix using gene¬ 
ralized matrix algebraic operations such as we have met in the system algebra. 

15 In the precedence matrices we have marked off the rows and columns 1, 2 and 
3 as these are distinct from the rest by being associated with firm functions 
rather than with processes of the information system itself, 

16 It is seen from P that "3, 8, 11, 12, 13 and 14” have empty columns and thus 
have no precedents and so are initial sets. Of these ”8, 13 and 14” are special 
by having also an updated version ”8 ' 13 ' and 14 ' ” respectively. This in¬ 
dicates that the initial ”8” is taken from the ”8 " ” of the next preceding run 

and similarly for ”13” and ”14”. The other sets, by not having updated versions, 
are found to be true initials which means, here, that they originate outside the 
information system described. This is important to see for it means that these 
sets have to be provided from outside and must therefore be designed for the 
double purpose of being conveniently and reliably handled both by the informa¬ 
tion system and by the outside originators, which are often human beings. We 
must observe that in this sense also each time a firm function (e. g. ”1, 2 and 
3 tf ) occurs as a precedent it will be originating information from outside the 
information system described. 

17 We thus see that also "1” as a precedent to ”8 " ' ” is to be regarded as an 
initial information set. We may denote it by "l ft .. . Likewise ”2 7 ”, ”3,-” 
and ”3jg” are initials. 

18 In a similar way sets which are precedents to firm functions are to be regarded 
as terminal sets. In fact they are reported from the information system to the 
firm functions. We see from P that "4 2 ”, ’’S-j", ”6!” and ”7 1 " are ter¬ 
minal sets. We also see that there are no others, for P has no empty rows. 

19 As we have seen the important function of defining the initial and the terminal 
information sets is well served by the precedence mat rix. ^ 

We now look back at the way we determined information needs of a firm (para. 

2 and 4). Let us now assume that the task of listing the functions and their in¬ 
formation needs has been accomplished. Let us assume as a simple example, 
that the firm is so simple that only two functions are to be served. Let us call 
them A and B. 

I) Continued in 27.2-5, 
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“aL^r« ctions we ^ found the r* 


a, b, and c, 


" £sr=s^zsi^“ai^ 

“ mt0rmta which *8 toBt«OIy motivated for a does actually pwTstoi- 
fioant numerical role. 2 > See al 80 examples below. ■ P y ^ 

Suppose we find in this way that to obtain a we need four information sets e f 
g, h say. One of these may represent a set of computer prog“ 

22 Example. 

to order to determine which information is needed to produce information about 

tor^f r , Cal ° rder 10 use we d ° »»t have to work ou“eOO 

formula. To work out the formula we have first tolecide which |arameters to 
consider m toe formula. Then, as a secondly, we do an analysis to detaU 

nioueto tn themathematlcal relationship among these parameters. Our tech¬ 
nique is to postpone the second detailed mathematical step. 

23 Example, 

To obtain Production demands on level i we first decide that toe production ore- 

iv toere e aftf d ^ °“ l6Vel ^ is ‘he infor m ation"red On- 

ly thereafter do wo determine the formula saying that extended matrtealgebra 

“ detai i ed h> be used. Again in the phase of ^st^ere 

is to be sowed *° ** determtoed old y «» first part of toe problem 

to information processing it is common to produce, from some precedents one 

to Jenot to X iB T: a nW ’ -'Sion of x. We shall use the notottnx 

to denote the updated version of x. Obviously 

X < X 

If we have the pair of precedence relations 
'o 

J (a) =* b, c,d; 


24 


7^(0 


c,d; 


1) 


2 ) 


itself' ZttoT lms , ° rf0 , r .T la3 assoclated with the precedence relation 
itself that is formulas which compute information from its precedents We 

^stahHZTe j !f 3 P ° SSible USe ° f f0rmal methods which iould 

establish the precedence sets. 

An example of this kind of analysis is shown in section 25.4. 
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then we shall always suppose, unless otherwise stated, that c' is first produced 
and then a is produced after c has been replaced by c', whenever this is possib- 
le. More generally we shall use the rule (when no specific ordering is prescribed): 

whenever x < y A y -f then 

or in words: whenever x precedes y and y does not precede x then we take x 
to precede y, instead of x. 

When a set a contains only elements which are also contained in another set q 
then we denote this fact by writing 

q 3 a or a <C q 

and a is said to be a subset of q in case we do not know whether q = a may be 
true or not. If q ^ a and q ^ a then we write 

q 3 a or a c q 

and call a a true sebset of q. It is obvious that it is of interest to know which 
precedence sets are subsets of other precedence sets. Therefore when we list 
a set of precedence relations we shall try to group them by subset relations. 

We shall have occasion to use the concept of set intersection x = yHz which 
will be defined as the set x such that if q<5 x then q€yajidq€z (or if q is a 
member of x then it is a member of both y and z). 

For example let 

y = al, a2, a3, a4 
z = a2, a4, a6, a8 

then x = yO z = a2, a4. 

Obviously, if z C y then yD z = z. 

oo 

Suppose we have a system as described by P : 















The columns of P 00 
9 (a) = e, f, g, h* 
9 (O =» e, f, g; 


correspond to the precedence relations 
7 (b) = a, e, i; 9(c) = g, i, j, k; 9(d) = a, e, e, h, 1; 
(H = a, e, i; 9 (D = i, h k; (1") = a, c, e, 1. 


The precedents are listed to the left of the matrix, labelling the associated rows, 
whereas the succedents are listed above their associated columns. The rows of 
P indicate the succedents of the set corresponding to the row. Thus g is seen 
to have the succedents a, c, e'. According to 25 we need not have e ', i' f y 9 
1 among the rows. A set which is indicated to have i, for instance, as a prece¬ 
dent will in most cases use i' instead, following the rule 25. 


From inspection of P 00 (13) we find 

29 ?(e') 9(a) 

5><n 9(b) 

5><n c 9(c) 

9(n c 9(d) 


Further we see that b and d have no succedents. In fact their rows are empty 
and have been deleted from P 00 . They are terminal sets. Similarly e, f, g, h, 
i, j, k, 1 have no precedents. They are thus initial sets. 

The rest have an intermediate position. They can be ordered by levels. These 
levels can be found by matrix multiplications involving P°° *), in our example, 
however, we can obtain the ordering by inspection. We may want the ordering for 
instance in order to find out how to draw the precedence graph associated with 
P in a proper way. Thereby we want to draw sets, which are grouped by the 
subset relations, as neighbours. We therefore look for ordering relations for 
these groups, 1 2 ) 

We want to place the terminal sets to the right. Both b and d are terminals 
However b is grouped with i' and d with 1". Further 

a « i' « j' « c < 1" 

Hence we obtain the order (cf. also 14) ) 

(a, e') < (b, i') < ( C , j') < (d, V). 


1) 1 La 1962-2. 

2) Corresponding to such groupings we will group together the process asso¬ 
ciated with these sets, see section 27.6. 


235 




We rewrite P 


in this order 






In drawing fig. 3 the rule 25 has been followed. Further we have introduced 
the simplification of drawing that one circle touching another one will have all 
the precedents of that one and in addition those which are indicated by lines 
flowing into itself. (Thus a has precedents e 9 f, g, and also h). Arrows have 
been drawn at the points of contact of two circles to indicate which of them is 
to have also the precedents of the other. 

In an information processing system a relation between an information set and 
its precedents will normally correspond to an elementary process that will 
take in a set of values of the precedence sets and produce a value of the set 
considered. The information sets will thus normally be represented in the 
system in the form of files. These we shall call elementary files and the sym¬ 
bols for information classes will henceforth mostly be considered to represent 
elementary files. 

The files commonly used in data processing systems are consolidated from a 
set of elementary files (although this way of looking at it may be new). One of 
the problems of designing an information processing system is to obtain an 
efficient set of file consolidations. 

The concept of consolidated masteriiles ie often a very basic one in existing 
data processing systems. The consolidation cf the files is often (in all cases 
the author knows of) advocated on very simpl j arguments such as that it is 
nice to have all data associated with one item (such as one person or one 
catalogue number) collected into one and the same record. This way of pre¬ 
senting the concept has many weak pents. In the first place it does not offer a 
basis for understanding and thus has low education quality. Second (and this is 
a consequence of the first) it is often not efficient to use a masterfile concept 
such as the common one as could be expected on the weak argumentation and 
lack of understanding. In a well working information processing system all in¬ 
formation associated with an item is safely available without having to be en¬ 
forced to belong to one and the same record of one and the same masterfile. 

It is therefore more rational to take into consideration different consequences 
to system efficiency of storing different data for the same item in different 
ways before making the final design decision. It may, for instance, happen 
that some data for an item are needed in one processing run and other data 
for that item are not. They will then be subject to unnecessary handling if con¬ 
solidated into the same file. This may be compensated by other advantages but 
we have to show that, in each case, before deciding. 

Many of the information classes we are defining will have to be bundles of 
still more elementary information lands. For instance information about a 
"job” may well be considered an elementary information class or an elemen¬ 
tary file. Yet we shall see 1 ) that to define it we need its identifier, size, 
type and, perhaps, worker involved. 


1) 25.4-10 



2 . 


On the definition of elementary files. 


1 When we discuss which information is needed at any point in a system {whether 
it is the information needed for a decision in a main function within a firm or a 
precedent again of such an information) we have to define the information needed 
as a set of more basic kinds of information which we shall call elementary infor¬ 
mation sets , or elementary files . Although this does not appear to be a signifi¬ 
cant problem we point to the fact that it is an instance of the general systems 
analysis problem called by us “the definition of inner boundary 1 ’. It also conforms 
with our Fundamental Principle, *) 

We also give an example of a problem that may arise here. It seems natural 
to consider a “eonto number” as definitely a bottom level item, i. e. a term. 
However we may have to be concerned whether in some process it might be 
necessary to handle for instance the first third of it separately. Having given 
this little warning we leave this subject and turn to the more difficult one of 
determining which groups of terms we want to treat as elementary files. 2 ) 

2 Let us first see what reason we have for introducing this concept. One reason 
is that we want to name and define such groups of terms - or groups of ele¬ 
mentary information sets - as are natural entities for being handled in the most 
elementary processes we are thinking of, 

3 What we have hi mind here is to have names, and most desirably suggestive 
ones, which are associated with well-known procedures within the actual field 
of appUcation. This would mean that on encountering such names in a block 
diagram or flow chart for an information system we would know fairly well 
what kind of procedures are being used and what data terms are carried in the 
files. This is in extreme contrast to the present practice where diagrams and 
descriptive texts only use ad hoc names both for truncated files containing lar¬ 
ge numbers of different kinds of data and for processes consisting of large but 
vaguely defined elementary computations. 

As an example we may consider the precedence matrix P of a set of products. 

It is used in the so-called “parts explosion" in production control. From our 
familiarity with it from the system algebra its naming is immediately sugges¬ 
tive of its meaning. In data files for production control the “P-data“ are 
usually hidden in large files called materials file or products file and these 
names have different meanings in different companies (because the files have 
different contents of other data than P which are also included in the file). 

Only when all data in such a file are named as members of such intuitive con¬ 
cepts files, hence elementary files, do we know what the file content is and 
what its use might be. 


1) See section 11.10. 

2) Our insistence on breaking down to elementary files can be said to be 
based on working principles well known in engineering design. Counter¬ 
parts are also the breakdown to basic activities in planning networks and 
to work elements in MTM methods of industrial engineering. 


238 



In American literature on production control the Information corresponding to 
our precedence matrix P is normally called "bills-of-material". A bill-of- 
-material is equivalent to a column in the matrix P. 

4 The present practice makes it impossible to extract much useful information out 
of available diagrams. When one tries to come below the surface of present 
systems one either is not able to get details or else only gets the detailed lists 
of all elementary terms in which case it is extremely difficult to find out what the 
meaning of each of the terms is. 

5 ff elementary files and computations 1 ) come to be used in a reasonably stan- 
dardized way, similarly to what is the case in physics, then the descriptions 
could be much more intelligible. It is also much more feasible to standardize 
on elementary files than on the consolidated files of different structure in dif¬ 
ferent firms. 

6 Another advantage that will come out of an advance of the concept of elementary 
files and computations is that the possibilities of using standard programs or 
subroutines would be much improved. 

7 To give some indication of the feasibility of a use of elementary files and com¬ 
putation in the way we propose, we may notice that the computation of parts re¬ 
quirements by means of the precedence matrix P (or combinatorial description) 
for the products and the demand vectors d°, d 1 , ... using computations of the 
form P (. ) d - d is an example of using elementary files, e g P and d 1 
as wen as an elementary computation procedure, e.g. the quasi-matrix-mul- 
tiplication symbolized by ( . ). 2 ) 3 ) 

8 Problem. In section 1, paragraph 13, it was mentioned that grouping of, for instance 
8" and 8’”would save two passes of file 8". While this was not directly visible from 
two 1-s in the row 8” of the matrix we pointed out that the use of identical parts of 
the file names, eg part ”8”, made it possible to detect this kind of potential saving 
by grouping. Show that this problem can be solved by a formal rule, applying on the 
matrix. Also show that this rule also works in similar situations where identical 
name parts are not appropriate! 

9 The problem of 8 is completely solved by formal matrix operations, when the 
incidence matrix E 1U is used rather than P 00 .' (cf section 12.10) 


1) See 27.1.1 for the definition of n computation". 

2) See part 1. section 12.6. 

3) For some illustrations to the concept of elementary files see sections 4 
and subsequent ones. 
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3, Inference problems in information systems design. 


The problem of drawing inferences is a very central one in the design of infor¬ 
mation systems. For one thing it is one of the main functions of an automatic 
information system to draw inferences from data available, in an automatic way. 
The most obvious example of this is when formulas are evaluated in regular nu¬ 
merical processes. The inference concept however, as we intend to use it, is 
more general and will be associated with any kind of process which makes use 
of some input information to produce output information and also will produce 
an estimate of the probability of the output being the "true” outcome of a situation 
defined by the input. 

Also in other ways inference can be an important function in information system 
design. 

When we have emphasized earlier that the precedence information associated 
with a certain output information should be determined in a way more direct 
than by working out formulas connecting the precedence information with its 
output, this is to say that an inference should be made in the "backward” sense 
that the precedents would be the output of such an inference function when the 
desired output information is given as an input to the inference function. 

We made the statement that probably no direct procedure or formula could be 
established that would determine the precedents for a given output information. 

It shall be defined on an "a priori basis". 1) We also suggested however that by 
establishing lists of already worked out precedence relations these could be 
used as a guide for subsequent precedents determination work. (This of course 
will also require that a standardization of terminology and concepts definitions 
has been established). In our present context (of inference) we can state our 
suggestion in the form: to define a relation 

P(x) ^ x 

is an inference problem using a priori reasons and possibly lists of earlier 
established precedence relations as its "basis”. 

If such an inference process should be worked out to be automatized it should 
be such as to accept precedence lists and a priori definitions. Together with 
the latter should be given some estimate of its degree of "incompleteness”. 

The output of such a procedure would be a precedence list for the actual infor¬ 
mation together with probability values for its correctness and completeness. 

If the latter are low this will point to a need for more intuitive work. 

Tn one obvious way can an inference problem like the one described, be improved 
during its use, thus exhibiting learning properties to some degree. This is by 
extending its set of lists of precedence relations for different classes of prob¬ 
lems. To make this evolution automatic is not too difficult. There is however 
also another way in which it should be possible to use an automatic information 
system to improve continually the inference process. Thus the best possible 
way to find improvements to the methods of making inferences, and to define 


1 ) 25 . 1 . 4 . 
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information needed for this, should be obtained by having an information 

r “ 0rmati0n re S ar ding the outcome of all trials of the 

eWwi f d ^ bG a baS1S f ° r further ^search, which would also be more 

efficiently performed when it could have available ample information and auto- 
matic processing facilities for its use 
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4, A Further Illustration to Information Precedence and Elementar y File Defini¬ 
tion* Computation of weekly salary, 

1 Let us assume that the salary will consist of components such as 


hourly pay 

piecewise incentive (Swedish: styek-ackord) piecework rate 
group incentive (Swedish: lag-ackord) gang rate 

The worker may have worked a certain number of hours on an hourly pay basis 
and also a specified numbers of hours on each of the two assumed kinds of in¬ 
centive. The system may well have to be able to cope for more types of pay but 
we disregard this at the moment. 

2 We have so far established the precedence relations for information 


piecework rate 
gang rate 


hourly pay, for week 
piecewise inct. 11 
group inct. " 



weekly pay 


3 To obtain the hourly pay for week for the worker, we need to know the number 
of hours in the week, worked under hourly pay. However there may be various 
types of work for hourly pay. Thus the day of the week as well as the hour of the 
day may determine the hourly pay rate. We need not at the moment specify all 
types that may occur but then it is necessary to provide for an unspecified num¬ 
ber of "hourly pay work types". 


4 We get the precedence relation 



5 For these precedents we need again define their precedents. The first of them, 
the hourly pay rate, will as a rule be constant for a number of weeks and will 
therefore, in general, be taken from a standing file. This however does not eli¬ 
minate the need for having a procedure for computing these values. Therefore 
precedents of the hourly pay rate need be defined. We leave this however and 
turn to the next item: work done during week, of type ij (where i. stands for 
any of the values 1, 2, ...). Before we can proceed now, we have to be speci¬ 
fic about the flow of information. To talk only of the kind of information, as 
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we have done so far, is no longer sufficient. The reason for this is that the 
work to be paid is normally performed in a time sequence of well defined single 

6 We see that we need to have some information for each job done, so that we know 

to which week does the job belong 
to which type ” « n ?? 

which number of hours 

?££ cal: wwch Kfho s uX to p:r ve (plecework) ot on ^ w “ d 

7 LTthTa!te™Uv e e WOrk ** 18 detennlned of day we 

indication whether hourly pay or incentive (piecework) 
start of job, week, day, hour etc. 
finish of job, ,f » u 

8 Se decision 7 ^ oalte “ e « here brings us into a decision situation 
Jcnnfc t a -a y , be reached hy considering questions which are for other 
people to decide - or for us to decide in another context. 

Both these cases presents an illustration of a point in the systems work where 
documentation is necessary: which decision is taken, why, by whom? 

dUii” 3 umen r taUo e n SyateraS 10 ^ “ d oareft ^ such 

8a SS’rlther!, 01 -^" TT™ mentioned to 8 apparently most often is regarded 
as being rather a knowledge situation" where knowledge of the situation presents 

ThtoL^ot 6 SySt T 7® COnipany ’ for distance) indicates the choice to be made^ 

“ 7 1 .7^ 1Ucky Way of huaao » the P^blem however for it is enUrelv 
possible that the present situation is not the best one. Only the explicit preseL 

? each P° mt of choice" in the system design as a decision problem tor the 

to ftod out whaftof t0 “ 7 th6refore »«•**«*. Thus ratoer tto Sg 
T , d ° ut what ls 111 use to-day the problem is to find out what should be in use * 
The systems analysis therefore is to be concerned about cle7r presentation of 
all decision points by careful analysis and documentation and f 0 m what ta _ 

° e “ ‘ T S J* f 7 d 7 i3i r * agato “ of information prece^euee^ 

relations. This, then, should indicate which people or groups are involve 

presmit way of emphasizing the explorative part of the system analysis work 
therefore seems to rather misinterpret the problem. 

8b °l the arguments of 8a 18 "company knowledge" should not 

o^erat La 0 TTe°m:“ P0 K rt “r “ 7 repla ° 6d be °° mpa “ y ^JoT 

777 T® mati who "knows the company" may, more easily than other 
people, overlook the decision points. Company knowledge will of course, be 
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made to enter when the decision question is made the subject of a discussion 
with the proper people, 

8c Remark The decision point may well have to be replaced by an explorative one 

8C f^eblem is not to^esign one specific system but rather to d -for¬ 

mation processing system which will have to serve several different systems. 
Thus if our problem is to serve different systems we cannot ob ^dec l sions 
easily This would involve standardizing conferences which, while well justified 
tothemselves, may take too much time. We then may w-t tor"e^ 
tive study to define which different decisions have been made in the differen 
systems concerned. Then the information processing system will hav 
designed to cover all variants, 

9 Let us assume that the second alternative (e. g. 7) has been chosen, for tostance 
because an automatic registration of time points is available to the worker. 

10 We have now come to find that one of the kinds of information needed, e.g. the 
toformation on an individual job, consists of a record giving one value for each 
of four different kinds of information: > 

/'employee number . . , „ 

main type (whether hourly pay, piecework rate / piece met. / 
job i gang rate /gr ou P u 101 - /)• 

time for start, week, day, hour, ... 

^time for finish, week, day hour, ... 

11 Thus this information comes in a bundle of different kinds of information, and 
in a sequence of groups of values, one for each kind, all values associated 
with the same physical item (the job). 

12 The sequence of such value groups is a sequence of records and is j*? 1- 
callv a file It is an elementary file in that it contains a group of different 
ktods of information which are required simultaneously to specify an elementary 

logical entity. 

13 We have called this group of information, or this ways 

value or a job file. We can look upon the elementary Ue "job , m different ways. 
Perhaps the most basic one would be to regard it as a set of different files 

"job i j ", e.g. 


job 01, job of piecework rate (piece incentive) typ 
job 02, job of gang rate (group incentive) type 
job 11, job of hourly pay type 1, 
job 12, job of hourly pay type 2, 


1) Note that we are assuming the situation, mentioned in 7, that "type"is 

determined by the time information. Hence only "main type need be given 
in the record. 
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: ~ W ? 0f l00klng at this 18 to re sai f j "job" as one single elementary file 

lfof5 PIOCeSSOtWMchrtl follow different lines depending onthe ’ 


15 


In the first case we get the precedence relations 


job 01 
job 02 
job 11 
job 12 


work done during week, type 01 
work done during week, type 02 
work done during week, type 11 
work done during week, type 12 


job In 


-> work done during week, type In 


16 In the second case we get 


job 


set of jobs done during week, tagged by type (01.... In) 


18 


17 ?! , addltion to the S° b specifications produced by each worker for each 

job it would be wise to have something of a jobtype on which to S 

the^eek^Thls ^ if 1 th ® standard number of regular hours for 

■i'r ;-r:s*“ 

iy cutferent mformation structure. The different types within the groun "ho„rlv 

foTthTdifte US f a *f e khld ° f process onl y tekmg different values of pa y~rat< 
h a d ^ ff ® rent types ' • (Admittedly this is an assumption and is therefore to 
be made subject to an explicit decision procedure in any realizafiof 

19 The elementary file "job” contains information 1 ) given by data reported bv the 

°. r * -T* 8 ^P—t. We have thus come to the tetmW Sess 
of break-down of mformation into sequences of precedence sets. P 


23 


m figv 1 we have compiled the precedence conditions so far discussed. 


?4S 


1) e. g. start time and finish time. 








21 ^ tiCe that ,“ 41118 analysis of information precedence we have followed 

processes'but j<ist 0 ^a^i 1 for 1 " I o^caP°and ’^^ur^Mnfifmmtfon^need?*' 1 ^^ 011 
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5. 


rw Hiqfrihution o * ^ costs as another illustration of discussing elemen tal 
files, 

tively be a department or work station, 

2 We may thus want to give to ea f b i° b ““ l^i^'fle^trill^m'areootd o£ a 

get several different such elemen ry ' u u reoeotitet indicates 

rssissr 

3 The elementary file for one kind of object assignment thus will contain 

job identification 
object identification 

it— - -°z£;'.zn ssrsx 

depending on the decision taken regarding 4.6 or 4. 7. ror msum 
get 

4 worker number 

start of job, week, day, hour, etc. 
object identification. 

5 It is clear that much of the ^ ata we to waste''some 

within each hourly-pay type over the week and then top y^py 

r.'.." — - «>■* ”■* '*• 

be chosen. 
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been defined and their specific requirements established. 


as a grouping of two computations - for arithmetic ^ ematlve can be regarded 
transport reasons Thia ^ rithmetic reasons and not for data 

needs defined by "cost distribution" ** ° Ut ^ toeffloient because of the 

9 b ef r T 7 rker Sterts ‘be job in order to 

vide reference IZ^lrT^ " ** ^ ** 3 ° b fo pro- 

takenT section 4 thi TMs C il°a £ed7 ^ b6 “ forced to ohaD S e *»»« decisions 
computer programmfo J eha “? e ,_ th » t is often »t detected until 

To establish p°° , the information preceden ° f °° U ? ae 18 extremel y expensive, 
before programming is started makos °eJ^a nx, as completely as possible, 

-—- 

cr.ssst^s“s.' 1 “ ,T m “ 


10 . 
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Identification of Prec e dence Information ,. 


. . • t>oo i o defined bv all the direct 

stsssestsss 'sszzzzzz sz 

of that information class. 

2 As * this way several differs ^ps - ^e **£]££ STS: 
class - or the same in the resulting P°° matrix 

same information class, while one 

and the same row should be used. 

This problem is a central sy3t ®“ 3 ^^Tead! ^groups in its use. A thorough 
to establish a common terminology and to teacn au giu v 

documentation will also be necessary. 

except for the very simplest ones, it 13 ^^“ss - or an elementary 

ttiSSZ ,=>»” SSSSSL. w • —- - 

its meaning. 

in such descriptions). 

, .. -r.,:- pvprv "user” of an information class as one 

4 As a further checking possibility every u3 u t for that class as soon 

iTLxzszst asprssa -—-■■* 

going to contain exactly what he wants it to. 

term. 
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7. 


Use of the Information Precedence Matrix P°° for Compatibility Checking, 

1 We have seen that in P 00 we have recorded complete information about all pre¬ 
cedents of an information set X and, therefore, of comp (X). This is important 
for it makes it possible, when we design a procedure to realize comp (X), to go 
back to all precedents to check that the process for comp (X) takes care of the 
formats in which the precedents are presented. If this is not done an incompa¬ 
tibility situation may occur which may call for changes at several places in the 
system. Again in this way we detect this fact before programming has to be 
done, 1) 

2 Still more important is, perhaps, that we can just as easy use P°° to identify 
all succedents of an information set X and thus obtain complete knowledge of 
all computations which are to use X as an input. This, of course, is necessary 
is order to be able to make sure that the format in which X is produced by the 
process for comp (X) will be such as to suit all succedents computations, 

3 Note that while this possibility does exist in principle also in our information 
precedance graph it is not a practical possibility. Note further that in the 
common flow-charts or block diagrams of data processing analysis the pos¬ 
sibility does not exist even in principle. 

4 The impraeticality of using the precedence graph for finding the totality of di¬ 
rect precedents or direct succedents stems from the fact that the practical 
systems are so large that the diagram can not be put onto one piece of paper. 

It has to be partioned into several sheets. The cross-references then necessary 
cannot easily be put in a systematic way necessary for complete checking. Note 
that this problem does not exist in using P°°, even if it is also partitioned onto 
several sheets. Why? Of course the best thing to do in this connection would 
be to store P°° on a file storage and let a computer do the scanning. 

This is however not a necessity, P°° lends itself well also to manual work. 

5 The reason why the information precedence relations are not available from the 
common flow-charts is that these show only the way different consolidated data 
files are used as input or output to complex processes consisting of several 
computations. It is not possible to see if an individual one of the elementary in¬ 
formation sets contained in such a file is used or not in that process. 

6 It is important to observe that the precedence and succedence relations between 
information sets are often much more involved, when real requirements are 
considered, than is recognized to-day. Present-day systems are oversimplified 
in this respect because information needs are seldom satisfied. One reason is 
that a systematic way of establishing all these needs has been lacking so far. 

P°° is such a means however. Another reason is a psychological one, Informa- 


1) The importance of the incompatibility checking is high in any system design. 
It will become even more important in large "on-line-real-time" systems 
of the future. A central group of one or several people must be given respon¬ 
sibility for this checking and for the documentation and oral discussions ne¬ 
cessary for efficient solution of this co-ordination problem. 
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tion is often used as a tool for requiring power within an organization. Then a 
desire for reserving access to it may be followed. Of course this is a tendency 
that has to be counteracted in any rational design of an information system. Top 
management must check that such desire is not permitted to act within the sys¬ 
tem design staff itself. 

7 One minimum requirement for making the existing information known to 
everybody is to have an explicit documentation of every information item existing 
in the system with explanation of its source, meaning and use. Thus to each 
data term in every file must be made up a document giving all this information. 

It is the file of these documents which, together with P°°, gives all basic infor¬ 
mation about the information system in its "logical” existence, (The file will 
then of course also give reference to physical storage location). 

8 What has been stated in 7 does, of course, not mean that management may not 
have good reasons to classify some data and be able to make their availability 
restricted. 

9 Exercise 
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Assume quantitative estimates have been made of the utility of the terminal 
information sets of this information system. (This may have been possible 
because the terminal sets control important functions of the controlled system). 
Use notations such as u(x) for the utility of the information set x and u(xuy) for 
the utility of the set of information sets x and y. What expressions can be given 
for the utility of the different initial information sets or of groups of such initial 
information sets? (assume, for instance, that if x and z suffice for producing 
the terminal information y, then u (xuz) = u(y)). 
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8. Some othe r uses of the precedence m atrix P. 

In the analysis of an information system it will be important to trace, for any 
information that is produced, all its precedences of all orders. This may be 
required for instance in connection with auditing because in this way it is found 
at what places in the system there is a possibility to manipulate and falsify the 
information produced. It is obvious from the system algebra, chapter 22, that 
matrix operations made upon the precedence matrix of information sets in the 
system can be used to define all precedents at each information in the system. 

It is also important to be able to trace in the opposite direction, for instance, in 
order to point out where in the system errors may occur as a consequence of an 
error in an input record. This, from system algebra, can be computed by the 
transpose of the precedence matrix P, e. g. by the succedence matrix S - P . 

The precedence matrix P is also a valuable tool in the planning of programming 
and testing the programs and subroutines of a large computer program system. 

A very successful application of this kind was made in the construction in Swe¬ 
den of a powerful programming system for handling geometric mformation, for 
instance, in connection with engineering design. This system is able to accept 
graphical information and to store on mass storage, such as magnetic tapes, 
the data of any geometric contour defined by graphical or numerical data to¬ 
gether with a name given to it. This system is also integrated with a system 
for mathematical design of engineering objects, such as ships, and contains 
an open set of subsystems for generating control data for numerically controlle 

machine tools. 

The program system could be made in Algol, thanks to the very powerful proce¬ 
dures features of this language. Thus no new compiler had to be made and the 
system automatically got all the general properties of Algol m^ addition to its 
own special features. It was possible to make all subroutines for handling the 
geometric design as Algol procedures which could themselves be written m 
Algol. As the system had to be able to combine simple contours owning one or 
two curve segments with designs consisting of thousands of curves a dynamic 
memory allocation subsystem had to be developed. This could alsobe made as 
a set of Algol procedures which could be written mainly m Algol. For this to be 
possible a basic set of Algol procedures written in machine oriented code was 
designed and used by the other procedures. 

To test-run the complicated set of procedures it was important to start by test¬ 
ing those subroutines which were initial elements in the precedence system ma¬ 
de up by all the routines and their mutual precedence relations These initial 
routines were identified by the precedence matrix. Next all routines which had 
only the initial elements as precedents could be tested and then one could handl 
those routines which had all those mentioned as precedents and so on. It follows 
directly from the system algebra of chapter 2.2 how this can all be planned by 
the aid of the precedence matrix P for the set of procedures. 

In the construction of the compiler for the general purpose programming lan¬ 
guage Algol Genius 1 ), combining the Cobol data handling features with Algol a 
^lar technique for detailed planning was used and found to be very efficient. 

1) 3La 1964, Langefors, B.: Algol Genius , BIT, Bind 4, Hefte 3. 
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1. Data and information files. 


1) 


There is a widespread confusion as to the relation between the concept of infor¬ 
mation and that of data. Often "data" is taken as equivalent to "information whi¬ 
le in other cases the distinctions are that "data" would stand for Process input 
and "information" would stand for output. We find the latter kind of distinction 
a very impractical one for any theory of information systems. The reason is 
simple: Most information produced by one process is taken as input to another 
one, and often to several. On the other hand any thorough analysis oa system 
handling data needs to make a distinction between the information and the data 

* used to record or process this information. We therefore use the 

1 Definitions! information Biformation is any kind of knowledge or message 
that can be used to improve, or make possible a decision. 

2 Definition of data: A datum is a representation of one quantity of information 
taTdTgTtS' forS. - "Digital form” is used here to mean any representation by 

a finite set of different symbols, usually - but not necessarily - in the shape of 
decimal digits or alphabetical letters or binary electrical signals. 

The formal definition we have introduced for data seems to us to agree with 
most informal usage to-day. Notice that it means that an analogue computer 
is not. a data processor and also that information representation by analogue 
signals like voltage is not a data representation. 

An analogue computer is still en information processing machine and this broader 
scope of "information processing" as compared with "data processing” is one of 
the motivations for using the concept of "information processing 


1) Cf. section 23.6. 
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2 . 


Size of data terms and precision required. 


Once we have agreed upon the definition of data as a general, digital represen¬ 
tation of information we encounter the question of determining how many digits, 
that is how many symbols out of a given set, are needed to represent a given 
piece of information. To get an understanding of how to answer this question we 
make the observation that when we talk about individual "pieces of information" 
these are, in general, associated with properties of an object. Both "property" 
and "object" are to be understood here in a very general sense. Thus "property" 
may be any measurable property, such as length, weight, cost, or a classifica¬ 
tion such as being a part of a certain kind being manufactured, or an identifica- 
tion as for instance "the 32nd individual part with part number 4387 manufac¬ 
tured to-day". 

1 To give information about a property of an object we see that we need four 
kinds of information, 

1 identity of object 

2 kind of property we want to 
specify for the object 

3 the specification of that 
property for that object. This 
is in information processing 
often referred to as the value 
of the property. 

4 the point in time, 

2 All three kinds of information can be seen to be associated with sets. Thus to 
give information about identity one gives information which is equivalent to 
pointing at one element in a list of individuals. Similarly to indicate which kind 
of property we are going to talk about we name the kind of property from among 
a list or a set of properties. Finally to give the value for the kind of property 
we again have to identify within a list of admissible values the element to be 
associated with the actual object. This is obvious in the case that the property 
value is not a measurable magnitude. Thus if the kind of property we want to 
talk about is the profession, the object being an identified man, then the proper¬ 
ty value is the man* s actual profession, as a physician, let us say. When stating 
this we identify the property value by identifying within a list of all professions 
that element which is called "physician". 

If the property we talk about is a magnitude it may not be so obvious that the 
value is actually specified in a way that is in essence again an identification 
with a unique element in a finite, well defined list. To see that this is so we 
need only to recognize that even if the property value is a result of a measure¬ 
ment it is obtained by identification within a list. Thus for the kind of property 
we want to measure we can always give an upper limit for its possible magni¬ 
tude - even if giving the lowest upper limit may be difficult. It is equally well- 
known that there is also some lower limit for the difference between values 
that can be indicated by the measuring instrument - or rather the class of all 
instruments that may come to use for the evluation in question. 
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Th e se hounds however specify a finite number of values which can be regarded 
as individual elements in a well-defined finite set. g 

3 ^dnroLT 6 ? 866 haVe 40 between kinds of property 

^f ^ I t; 13 defined by identif Nation within the set of values^ 

of that property. Thus to each land of property we associate a property value 

tfelsihiM u r °r r T ldnd 18 “ lt3elf a value iden «£yhig the kind within a list of 
(feasible) kinds of properties. 

We see that the information about an object is given by a set of values one value 
1 en ifymg the object, another value identifying a property, a third value identi- 

Sertyl'ld so'cn. 6 S6t f ° r thlS Pr ° Perty> 3 foUrth Value identif y“g a second 
4 associated 71 it! ^ “ ° f W* values 


We shall often call a single data representation for a property a "term"- 
times a datum or an elementary , item (as in Cobol). " * 


some- 


10 


The storage space allocated for the storage of a term will be called a "field ”. 

6 J be ! eng . tb ° f a , te ™ 1,411 be determined by its actual value. It may thus be 
triutoWt^ be - ^f habet hy whlch 14 18 Siven digital representation contains a 

which may be omitted if it occurs in certain positions in the word 
giving the actual property value. - 

7 “ iB . therefore Possible to use varying field length for a term storage. This will 

S ° me k “ d ° f mdloatlon of wher e a term start or ends. This requires 
be red ™ da f cy - either b y Permitting one digit within the alphabet to 

^ haVtoS 1 St0rage l ° h ° ld ta£ ™“ 

It is possible instead to use a constant field length for a term. It must then be 
equal to the maximum term length and is thus determined by its maximum value 
or by the number of elements required in its property value set. 

8 Pt T Se 3tate that the value possible for the maximum 

m length is the number of bits (or binary digits) needed to represent the num- 

_/ f elements contained m the property value set. This is true if the physical 
storage technique uses 2-valued signals as its basic operating elements As this 
is practically always the case and as the binary representation has distinguished 
use m information theory we prefer to take the statement as generally true. 

kind, 1 rtT rtant t0 DOte th3t Wh6n We have teen talkin g about values of measurable 
fands the precision we need to use during computation may be much higher than 
that corresponding to available physical measuring equipment. 

This is associated with the fact that we will often have to compute values by 




taking differences between two numbers of almost equal magnitude leaving a 
result with much fewer digits. The problems associated with such kinds of 
"ill-conditioned" computations are very important and must be carefully ana¬ 
lyzed when an information system is to be designed. Else the accuracy of the 
results may become none at all, 

11 Also the "ill-conditioning" may well be the result of an inappropriate system 
design. 

12 Very often die maximum term length used will be much longer than the mini¬ 
mum one of binary representation. Decimal representation already makes fcr 
an increased length with about 20 % (the code allowance). 

13 More significant allowance to term length is caused when the value, that is the 
association with the proper element in the value set, is not given by a number 
indicating the position in the value set, but a combination of groups of digits 
is used, for instance, for ease of memorizing or for giving some means for 
"understanding" the value. This is the case for instance with person names. 

14 When more of the information handling is being done automatically the advan¬ 
tages of structured numbers or alphabetical data is reduced. At the same time 
the cost for the extra space they need is more significant. Hence a reduction of 
this kind of representation is a natural consideration when automatic informa¬ 
tion systems are being designed. 
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Part 2 


Chapter 7. Files,computations a nd processes^ 









™i S ..°° I T 0 r. t t“ day ’ when doins ®y stems work for data processing, to ritow 
run which is supposed to handle, In one computer pass, all transaction 

™T Clat \ d T " 0ertai “ ma3ter flle ' Later « when a lit of computer 

p grammmg has been done - and a computer ordered - it is then found that 

rZ tJT ° r I/Q ~ eqUipment ““sideration requires the run to be broken 
up into two or more smaller ones. This increases processing time, of course 
an 1 may be found - too late - that the equipment cannot do the job it was plan- 
“ d to ’ ° r at . least > expensive reprogramming has to be done. This is common 

ae^ y nrobl e rs n cto y3 i emS jr k r6qUireS that the systems understand 

Lorn! r bl " arly aS m ° re ratlonal wa y g of handling them. We are 
going to present here an analysis of how this can be achieved. 

Likewise largely consolidated masterfiles are, as a rule, designed from the 
bepnmng of a ^sterns work. We here take up a more appropriate way of Xing 
rae fite 1 regard 10 cons equences of consolidating more and more data intQ S 


I 
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1 . 


Files and processes. 


We have seen that as a rule a certain kind a of information is obtained by some 
combination out of a set P(a) (= the precedence set of a) of other kinds of in- 
formation. We shall use the word computation to denote the set of all proce ur 
which take P(a) and produce a. In general it is wise to assume the possibility o 
producing different procedures. For instance one may know a procedure which 
will give an approximation to a, while one is still in search for another proce¬ 
dure which would produce "better values of a". We therefore use the word com ^ 
nutation for a for the set of all feasible procedures for getting some approxima¬ 
tion to a and write for short Comp(a). It is important to try to define P(a) with 
regard to the process for a, and not be satisfied with a subset of it which might 
be sufficient for the one of the feasible procedures of Comp (a). We may also 
find it convenient to refer to the precedence set of P(a) a as the precedence of 
Comp (a), denoted as P Comp(a). Also a may be regarded as a member of the 
succedence set S Comp(a) of Comp(a) and also as a member of the succedence 
set S(P(a)) of the precedence set P(a) of a. 


The information in the succedent set a of a process as well as that in the pre¬ 
cedence sets P(a) = b, c, d; will consist of a set of values of one information 
set, often called the key , and for each value of the key a group of corresponding 
values of the information classes contained in a set. One package consisting 
of one key value and the corresponding values within the set will be called a 
record. Thus for any value k of the key there may (or may not) be a record a^ 
of the output information set and a record, c k , ... of each input set A set 
of records for a set of k-values will be called a file and we shall in most cases 
consider a, b, c, ... to be the files of all a k , b^, c k , ... 

In some cases it will be important to make a distinction between "standing files" 
(or files proper) on one hand, and files consisting of temporary data such as input 
data, output data and intermediary results on the other hand. We shall then use 
the word "transfile" for the files of temporary data. 


To each precedence relation there is associated a computation. The actual pro¬ 
cedure may however be designed in a variety of ways. Thus whereas P or 
its graph is a unique representation of the system of computations, there may 
be different systems of actual procedures for the implementation. 


Among the computations will also be sorting operations for a file, necessary 
between two computations proper. Computations will also have to be introduced 
merely for data-handling reasons. 

The number of entries + 0 in any row of P 00 equals the number of computations 
for which the corresponding data set is an input. Thus for instance P of 5.1- 
shows that e (or e') is used as input in six different computations. If the compu¬ 
tations are all taken as separate computer runs we thus will have a multiplicity 
of data input transport for all files. This multiplicity may be even higher than 


1) Recall that a number like this refers to item number 28 in section 1 of 
chapter 5 (in this same part, hence part number is omitted). When we 
refer to a page we say so. 
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that obtained by counting unit entries in 5 1 is 

may call for multiple scanning of so mP b ause some computations 

the units in P 00 by the numbers giving the In'*! to ? lcate this by replacing 

scan of an output file. If however th# ^ a te m P also a multiple 
is replaced by the graphD 

pllcity of both input and output. We return to this po^t‘later! 


1) As shown, for ex., in fig. 5 . l-i. 
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When the file information is used in 

reby) it will typically have to ^eran^ ^ of the ^formation system will 
renTe S o"ed° with cruizing the ffle TL^mo^toC 81118 

tion ifneeded as an input for a process g-^ — t" on 

as an output (or even merely as a copy) from apmce transport equipment, 
the size of the file and on the performance of its store and transpo q f 

The size of a file will, 

alphanumeric character of 6 to 8 Pits, we <jone ^ 

sssAtZA* y«-;.«£=s: rri 

of fixed size (for a specified computer) are to be use S 1 

decimal digits corresponding to the same number o 

s^-js=as===^«“= 

alphabet we want to use. 

Thus if we are satisfied with block letters^ «have 

T^tT:T 3 £ character - whether 6 = - •fig?* 
in this case. Instead if we require to use also minor letters then bits, 
least, are to be accounted for. 

snssiTiSitsas ssss —a 

corresponding to the different character quantities. 

2 Example. A record containing 10 6-bit characters, 50 decimal digits and 3 lo- 
gical terms has size; 60 + 50 - 4 + 3 - 263 bits. 


1 ) Cf. however section 23.6 
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”i‘.tiifi“ "“ ru “* " “ “»*“*■ u “* w~ - a. »«.=«,. „, 

?— "■“!■ “™‘ ~ - ««•«...» 

Thi« h,, the • , peeds 80 that comparison can only be made on a time basis 

media will ^dy ^ “““ Sy8tem beoau8e “s different file 

frefer^rr^K *7 tUS W ® conslder most cases a fixed period of time - 

• Sil=SSS= 

^Sr~S£SrSS~2 

4 j |mmple ; Which will be the average size of a file record which consists of one 
alphanumeric term of a (dynamically) varying size between 3 id 30 characters 

probable, a one^ode^ritionly'twoialis'fiine’^an^ofciise^f " d "" ’ 

••■■■" I. to „ 

(size one character) is reared“Te mVstora^^ftlfeTch term to^Th 
we choose to use dynamicafiy varying field 

separator ,s required if only one record type tensed in the Se, wMeil terms 
native f Ve ® eparators lf other record types are also used in the file An alter- 


1) Section 27. 7. 
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Answer. 


i. Only one record type is assumed. 

The alphanumeric term will require on the average 11 characters if stored in 
varying fields (average size 10 plus 1 for term separator). 

It would require a field of the size of maximum length, hence 30 characters if 
stored in fixed fields. 

The 10 numbers of size 7 will have their maximum size in about 90 % of their 
occurrence for they will be shorter than 7 digits only when the first dig* “as 
value = 0. Hence nothing can be saved by storing these terms in varying fields. 
Thus 70 characters are required. The two-valued code will need one character. 

The last term would be 1 character in 95 % of the cases which might indicate the 
use of varying field but as that would require one separator character we would 
still need 2 characters (and in 5 % of the occurrences we would need 3). Hence 
a fixed field of two characters will be allocated. 

Total average record size will thus be 84 characters, 

b. Different record types are used in the file, and no separate work areas for each 
type of record is used. 

In this case we will need a separator with all terms. This adds 12 ™^ e se ^*" 
rator characters. However in that case the reason for using fixed field for the 
last term disappears. We use varying field which reduces its average size by 
about one character. Thus 12-1 = 11 characters are added under this condition, 
to give 95 characters as the answer. 

c. Different record types are used and a record is moved to the specific area for 
its type before it is processed. 

In this case the space required for the record in the file storage will be as in a, 
above. 


Remark. One contribution to tile size which Is often forgotten in preliminary analy- 
7is is data terms needed for the administration and structuring of the data pro¬ 
cessing and the files themselves. 
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5 Example 


File Sizes for the Simplified Firm Example. x > 

In order to determine the sizes of the files we list as is usual • . 

i«: ecords) ’ the output or “ —' 

may hav“me P des^: WMCh *" * PhySiCal store. It 


size 



decimals 

bits 

Record label 

Q 

12 

24 

Item no (with check digit) 

Quantity in store 

O 

5+1 

4 

Date of inventation 

A 

16 

Reason for inventation 

0 

3 

24 

12 

Total size 

22 

88 


item lred ' ^ 18 I2bltS ’ We haVe farther assumed that the number of distLIT 
check S£ ZT use" ^ *“* *" “ 

2£ a f® conslstto g °f messages about sales and may be regarded as the 
central information for keeping the system studied in operation A sales order 

of itenT^Wte^ma™ 6 nU ™ b ® r ° f " liD6s "' eacl1 specifying a request for one kind 
records, "3 5 a" and S'k “ C ° nSlSting ° f tWO (or more > Afferent elementary 


3 cSL 


Record label 
Order number 

Customer number (with check digit) 
Date of order 

total 


size 

decimals 

3 

5 

5 

6 


bits 

12 

20 

20 

24 


19 


76 


1) Continued from 25. 1 - 19 . 
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iiote; We have left out name, address, credit conditions and other information 
about customers because we have estimated it to be most economical to 
let the information system supply that information itself. This calls for 
a "customer file" which must then be added to our system. We add it 
below to the set of standing files. We give it number "15". "15" will 
be needed as an input to process "5" but as it cannot be sorted in the 
order used for "5" we shall need a special process for making records 
of "15" available to "5". *■) 

size 




decimals 

bits 

Record label 


3 

12 

Line no within order 


2 

8 

Item no (with check digit) 


6 

24 

Quantity ordered 


4 

16 

Delivery date 


6 

24 

Delivery conditions 


1 

4 

Pay conditions 


1 

4 


total 

23 

92 


Every record of type 3 5 b is taken to be associated with the 3 5 a record immediate¬ 
ly preceding it (possibly separated from it by some other 3 5 b records). We do 
not show, in this example, any more designs for input records. As an illustra 
tion of an output record we take "5i", the "order to ship" sent to "store V . 

Again two records are used, one specifying the customer and the addressee 
and another specifying items and quantities. 


5^a 


size 




AN-characters 

decimals 

bits 

1 

Customer no 


5 

20 

2 

Customer name 

15 


90 

3 

Customer address 

15 


90 

4 

Ship to 

15 


90 

5 

Address for shipping 

15 


90 

6 

Order no 


5 

20 

7 

Date of delivery 


6 

24 

8 

Delivery condition _ 


1 

4 


total 


60 


17 


428 


5 x b 


1 Item no 

2 Quantity to ship 


size 

decimals bits 
6 24 

_ _4 _16_ 

total 10 40 


1) We have to observe later that file 15 is not represented in the matrix P given 
earlier. 



We do not discuss the other output record designs but go over to the standing 
file record "8". 

We see from P 1 2 3 * that "8" or any of its versions is input to "4, 5 and 6". We 
therefore inspect all these to find out what information must be stored in "8" 
in order to serve them: "4, 5, 6" will need 
"quantity on hand" 

and "quantity on order" from "8". 

size 



decimals 

bits 

1 Record label 

3 

12 

2 Item number (with check digit) 

6 

24 

3 Item class 

1 

4 

4 Quantity on hand 

4 

16 

5 Quantity on order 

“ 4 

16 


total 18 72 


In the record we have added a term "Item class". This we have done in anti¬ 
cipation of a need to handle different items in a different way. 

The standing file "13", through its version "13'", is input to "9 and 10". 

"13" is also input to "13'". "13'" will need information from "13" about past 
demand. We will assume here that this will be taken from sales orders. The 
forecast of demand may use a set of past sales orders and this set would then 
have to be stored in "13". An alternative which is usually preferable is to use 
exponential smoothing for prediction. This eliminates the need to store several 
past sales orders. 

One will then have to store only the smoothing parameter CL , the last smoothed 
value of demand (predicted value), and the current actual demand^). 

If one considers it desirable to use also double and triple smoothing this defines 
a need for some more data**. We assume for simplicity, in this example, that 
these can be neglected. 

"14" will need data about past delivery lead times in complet e analogy with the 
need for demand data for "13". 


1) 25.1-11 

2) See R. G. Brown, Smoothing, Forecasting and Prediction (1 Br 1982). 

3) Section 27. 7 


271 




13 


size 

decimals bits 


Item no 

6 

24 

Current demand 

4 

16 

0^ -demand 

3 

12 

Last smoothed (predicted) value 

4 

16 

New smoothed demand 

4 

16 

Forecast from Sales 

4 

16 

Predicted demand 

4 

16 


total 24 116 


The term "6" (Forecast from Sales) indicates that we have planned to compute 
demand prediction not only by numerical extrapolation of demand history but 
also let estimates from "Sales” have an influence. The predicted demand, term 
"7”, is composed from "5” and "6”. We do not need to work out the exact for¬ 
mula or algorithm already now. The term "6” must not necessarily be filed as 
it is used just for the computation of term "7”. We have decided to introduce 
it into the record, however, in order to have it available for possible statistical 
analysis. 

For the file "14” a completely similar record will be obtained, where delivery 
lead time takes the place of demand and terms "6 and 7" are deleted. 

It would be possible to replace files "13 and 14" by a single file, giving "demand 
during lead time". This is the common way. We do not go into details of that 
analysis here, however. 

Again we skip the analysis of the file "15", It may be wise to point out that we 
have made a simplified and brief analysis, so that further analysis might well 
have indicated the need to have more data in the file. In comparing with existing 
systems we must remember, however, that we have defined elementary files 
whereas common files are to be considered as aggregates of several elementary 
files, at least this is true for the standing files. 

We can now insert our record sizes in tables 1, 2 and 3. For those records 
which we have not analyzed we put the size equal to 100, for simplicity, except 
for output records which we have, equally arbitrarily, set to 300 bits. 
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Table 1 


Numbers within parentheses are 
"numbers of characters" 


Input records 



size of 

number 

Size in 



records 

per day 

1000 bits ( 

3haracters) 


a 

b 

a 

b 

(per dav) 


X 8 

88 

- 

100 

- 

8.8 

Physical inventation reports 


(22) 




(2.2) 


h 

100 


5 


0.5 

Delivery documents, from shop 


(25) 




(0.1) 


3 5 

76 

92 

200 

600 

70.4 

Sales orders 


(19) 

(23) 



(17. 6) 


3 13 

100 


1 


0.1 

Forecasts made by Sales 


(25) 




0.03 


11 

ioo ! 


0.1 


0.01 

Specifications of inv. holding costs 


(25) j 




0 


12 

100 


0.1 


0.01 

Specification of out-of-stock costs 


.mJ 




0 


Table 2 


Output records 



size of 

number 

Size in 

— --———..— -— *.-. 


records 

per day 




a 

b 

a 

b 

(per dav) 


4 1 

300 


5 


1. 5 

Refill order (copy) 


(50) 




(0.25) 


4 2 

300 


5 


1. 5 

Refill order 


(50) 




(0. 25) 


5 1 

428 

40 

200 

600 

109.,6 

Order to ship 


(77) 

(10) 



(22. 9) 


6 1 

300 


100 


30 

Initiate phys. inventation 


(50) 




(5) 


71 
± 

300 


5 


1. 5 

Refill delivery 


■.iiQ) ■ 




(0. 25) 
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Table 3 

Standing file records 



13 

14 

15 



In making up the tables 1, 2 and 3 we have made reasonable assumptions about 
the numbers of the different input and output messages a day as well as about 
the number of items in store and the number of customers. (These later assump¬ 
tions were made already to determine the number of digits necessary for item 
number and customer number respectively.) We have also assumed that in files 
where there are two different records, a and b, then there are on the average 
three b records for each a record. This would mean, for instance, that the 
average number of lines on a sales order would be three. ^ 


1) Example continued in 27.3-5. 
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——Tran sport Volume, Processing Period. 

normauy be unused block apace . „ ^ 8tor age is “1 vlria^e 

™ntl Ze * “ 1S ““ f ° r tape8 ' then there “11 be inter-record” gap^wMch 

1 need * 

of bits actually used up by the file when stored In file storage! 

bitH ° fte “ 136 6qUal to ttle sum ° f «■ Plus the number of 

be thTca^ to SaPS °! r USed bl ° Ck Space - Thla ^ to wever not always 
n ° r 7 “f modlf y ibis “umber by using a different mode of storage 
1“ "oormal mode- used to define the size of the file and we may also use 

form which°will reduced Zume" ^ a deCimal to4eger 111 bina ^ 

While the file size will be a fixed quantity - or at least will be so regarded 
during certam phases of system design - the file volume will be changed by 

m~T S10nS - ^ ^ taCreaSe 14 ta order ^ve memory spaTeor we 

^dll L *? redUC ® “• Thl8 WlU depend ° n v*at memory space 
will be available and what alternative uses we have for it. * P 

More precisely it will not be the volume of a file that we will try to reduce in 
the design work but rather the processing . time that will result from its volume 
Therefore we shall often talk about this time and, in fact, use it as a measure 

2 of the volume of the file. We shall define the file transport to beTe 

time (m some suitable units, hours in some eases) taken to trin^ort the file 

r p "“ r ■“* - - - ■ ££ 

It is seen that a file of a certain fixed size will have its transport volume vastlv 
reduced when it is copied from a punched form (punched tape « punched cardsf 
onto magnetic tape or into some other common type of file storage. 

2a Analogously we define the file transport size to be the transport time corres¬ 
ponding to the file size and the nominal transport speed. 

3 L h n ° f tranap ° rt for a flle . OT er a certain reference nerioH 0 f time 

e °. f am68 14 haS 40 be ‘ ra “™ d -big that period ’ 

1) section 2. 

2) Jw!,Ta rtant t °, r f member that if the aie storage used necessitates set-up 

tune and dismount time (as for instance magnetic-tape reel set-up, rewind P 
and reel dismount) this time is also included in the transport. * 
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We have seen that the number of times a file need be transported abends on 
the relations between all information sets and processes m the informs 
the Of the information preced en ce •MBS'*'** 0 

SSonL number of toes thedifferent processes will be“^ dunng the 
reference period. 



(reference system). 



6 &£££« one file to be the amount 

porta'during^he'referOTce'pertod'^multiplied W the file transport volume. 

7 ^tmS^te^em to be the sum of the Pruett, of file trans- 
port volumes and number of file passes in the refereneesystem Thus 

ts a vector of file scan multiplicity in the reference system, that is /t is a ve 
to v wfilfone* element for each file in the system, and such an elementhasito 
value 1 equal to the number of scans of that file in the reference sy stem.and 
Uv ia Z vector with elements giving the respective file transport volumes, 

then 


trp = v (the scalar product) 

computes the trp in the system (for a reference period). 
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9 Example, 


In the example above, 5, we had: 


a b c d e a 1 



Suppose the file transport volumes are 

v: a b c d e a^ 

10 1 1 1 1 10 

The system transport becomes 

a b c d e a 1 

trp = p * v = 2-10 + 1+ 1+T + T+ ~ 

4 4 4 

or 

trp = 35, 


10 It may be useful to define the volume factor n y ; 

_ file volume 
^v file size 


11 



1 ) 


As we have 

file transport size trps: 


12 


trps = 


file size _ 

nominal speed 


and 

file transport volume trpv 


13 


trpv 


file volume _ 

actual transport speed 


v . file size 
v _ 

Tj * nominal transport speed 


1) Notice that "nominal speed" has to account for such times as are involved in reel 
change and tape rewind or in mechanical access movements. For example if tape 
scan is done by 100000 characters/second then rewind time and reel exchange time 
may reduce the "nominal transport speed" to 60 000 ch/s. If block gaps and empty 
space in records reduce actual trp-speed to 30 000 ch/sec then 7? = 0. 5. In case this 

file happens to be transported on a unit with a speed that is twice that of the units 
for which the "nominal" is determined we would have the speed factor = 05* 2 = 1 
for the specific file. 
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Hence 


trps = y ft ■ trps 


where y ^ 


can be called the transport volume factor 


y . tells how many times larger than the transport size the transport volume 


Example, transport size: 

(continuation of ex 5, section 27.2) 

In computing file transport sizes for input files on punched tape or punched cards 
the analysis into bits will often be unnecessarily fine. Also, while it may make 
quite a difference in punching operations whether numeric characters only or 
alphanumeric characters are to be used, the input speed will, as a rule, be in¬ 
dependent of this. It will therefore often be sufficient, as far as input of punched 
data is concerned, to measure file sizes in characters and measure speed in 
characters per second. As the table of input files made up earlier was made in 
terms of 4 bits per characters we can most easily proceed by computing the no¬ 
minal input speed for the assumed medium in characters per second and then 
multiply by four to obtain an equivalent number of bits per second. In this way 
we obtain too low ideal speeds as the actual number of bits that could be carried 
by a card column is much larger than four. On the other hand, if we do not ex¬ 
pect to have to compare with alternative types of recording, this effect can be 
neglected without any drawback. Let us assume that input files are on pimched 
cards and that 600 cards per minute is the nominal reading speed. This amounts 
to 10 cards per second or 800 characters per second or 3200 bits per second 
(equivalently). 

Output files can be analogously treated. 

We assume 1000 lines/minute printing speed, 120 char/line which amounts to 
2000 char /second output speed. We assume 400 000 bit/second to or from mass 
memory. 


Table 1 


Table 2 


Table 3 


input files 


output files 




1 S = 26 1 

800 char/second 


standing files 





transport size 
for one input 
and one output 



2000 char/second 500 000 bits/second 
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ss=r. 

m s more suitably taken up when discussing transport volumes. 

16 Example, transport volumes. 

oTs^cIrL" F TmlT* ^ ° n PUnChed «"*• thUS t0 have ^ volume 
f * characters. For file l g we had a record size of 22 characters and the re- 

h T taSB °“f rGCOrd P6r Card ’ we have a volume factor = 80/22 = 3 65 

ZZ ^“btTed.? ^ " mUCh b6tter V ° 1Ume &Ctor - *°* t° 1. ' ’ 

The speed factor would be about n —no foir!«n. , „ 

mainly caused by the interaction bet^en file mthiT^Ttoe ““ tZ 

transport volume factor „ * ls thus (for file l g ) = 3.65/0.8 = 46 

*“.?? °* aae ° f ° ard fileS “ l8 ’ however > simpler not to use the factor n but 
to compute the transport volume directly from the number of cards V 
volved and the "card speed" modified with the speed facto “s to ouf~le 

ZT f t T dl£ie t Card 8P6ed 10 • °-3 = 8 P cards per second From oT" 
earlier data we obtain for file 1 8 : loo cards per day, hence 100/8 = 12 5 

secon s. For the other input files we proceed to similar fashion. The result is 
ecorded m table 4 H we apply the same procedure for the printed output and 

oirrzr“ 13 —- —• ^ 


Table 4 


Table 5 



Card input files 
Transport volumes in a 
reference period (one 
day) seconds 

1 8 

12. 5 

^7 

0.6 

3 5 

100.0 

3 X3 

0.1 

11 

0.0 

12 

0.0 



Printer output 

Transport volujnes in a 
reference period (one 
day) seconds 

4 i 

0.4 

4 2 

0.4 

5 1 

61, 5 

6 1 

7.7 

h 

0.4 


We see, on comparing with tables 1 and 2, that the transport volumes are (to 

IffecroT^T 1 ^ 1 '^ larg6r tha “ th6 trans P° rt sizes thus indicating a large 
effect of the hardware properties, “ 


1) table 1 of section 27.2 
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To determine the file transport volumes for the standing files on magnetic’ 
or direct stores is much more complicated and requires a se o c oic 
made first. 

The file volume will depend on the recording form. If binary coding is used for 
instance, we need no "code allowance" but may have a word allowance of 30 % 
if data are not packed to words and if data are packed we may still haveabout 
10 % of allowances when subrecords terminate within a word an any 
Lord sterlHi a new word. If instead data are stored as 7-bit characters 
which is by far the most common to-day, then we use twice as “any bits for nu 
meric date as are necessary. As most date are numerical we find that m cha¬ 
racter based coding we obtain about 100 % code allowance, correspondmg toa 
volume factor of 2. As it is obvious now that new equipment is going to be far 
less wasteful with mass storage space we can assume a factor of, say, . 
as typical and use this in our example. This is still not the whole waste of 
space however, and we will have to add at least space for mter-record gaps 
a£d, most likely, for unused block space (when fixed record length is used). 

We now make the assumption that magnetic tapes will be the mass storage^me¬ 
dium used. To determine the effect of inter-record gaps we assume 4000 bits 
per gap which is about typical. As we found our (elementary) standing files to 
have a record size of about 100 bits we see that if we would store eachelemen- 
tery record separately we would obtain a volume factor of 40 already because 
Ss This is of course quite unacceptable. In practice the consolidation of 
elementary files to more complex files with much larger records and the block¬ 
ing together of several records into one physical record means that we will have 
sizes of physical records between 1 000 and 20 000 bits, correspondmg to a 
volume factor of 4 to 1.2. 

How much blocking of records to have in order to reduce the volume factor de¬ 
pends on the memory space available and this can not yet be decided As a 
first trial it is prate cal to use short blocks, 1 ' say 1 000 bits, for files with 
sm^l Stee and, say, 10 000 bits for large files. We would thus obtain the volu- 
me factors 4 and 1.4 respectively. 

Table 6 


100 000 bits 


file size 


file volume = 

= *7 yi * ^ v2 

block size_ 


file size 


10 000 bits 


1) we use the term "block" in place of the clumsy "physical record". 
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^ vl 5 - effect of code allowance, 
word allowance etc. 

^ v2 = e ^ ect record gaps 

^ V 2 ~ 4 for record size 1 000 

71 v2 = 1,4 for re cord size 10 000 

sotd'fJirrv erm ? e tH ! flle trmap ° rt aizes we must •»*> determine the 

latter hasZ' Jte♦ l \ T “‘ ° n rewtad tlme “ d reel re P lac e time and the 
latter has an effect which is larger the smaller the file is, in case the file vo¬ 
lume makes up less than a full reel. Also the use, or not, of simultaneous read 

use of^’flin^floo"^ ha8 3 large ef / eCt ° n the Speed fact ° r 311(151180 the P° salb le 
use of flip-flop" arrangement of tape handlers to eliminate the effect of reel 

Um^Th^ rewind^ff™^ ■ ^ a backward fea ture eliminates some rewind 
Thus if teer l ' 3 eaSy *° determine as one part of the speed factor. 

Thus if there is no flip-flop arrangement and if rewind is "r" times faster than 
read/write speed then this generates a speed reduction factor 

r/ si 1/(1 + 1/r). Let us assume for this illustration that r - 3 which 
is common, then we get v sl = 1/1.33 = 0 . 75 . 

ZrZZZT ^ effect 0f reel replac “ we assume that this takes 100 se- 
7 1! / a ! SUme a tape length 29 000 inches and a recording density 

bite, Nn bl f 8 per r Ch ‘ JZ S the gr0SS Storage volume of a tape is 200 million 

(table waVfi! Z largest » has a volume of about 10 million bits 

ff *1’ Th . 8 18 ° nly 1//20 ” th of a reel which appears to indicate a very high 
effect of replacement time. Recall however that the files we have described 
are elementary files which, when stored on tapes, are likely to be aggregated 
to other files. Again how this consolidation of elementary files will be done is 
one of the things we have to decide upon during design and therefore is un¬ 
known to us at the moment. However as mentioned above the records of our 
elementary files appear to be at least ten times smaller than common consoli¬ 
dated files. As a first approximation to the computation of the effect of reel 
rep acement time we may therefore make the assumption that one tenth of a 
reel replacement time has to be charged to one of our elementary files - if 
it has to undergo a replacement, which is not always the case. 

We have come to the conclusion that to determine the file transport volume 
when there is no flip-flip arrangement and when the file goes on ohe single tape 
reel we use 7 ? sl = speed or G. 75 • 400 000 bits/second = 300 000 bits/se¬ 
cond and then add 10 seconds. 


Table 7 
file volume 



300 000 

trp volume 

with flip-] 

8 

30 

40 

23 

13 

41) 

59 

36 

14 

35 

45 

26 

15 

3 

.. 13 I 

2 
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We have entered the results in table 7. When there is flip-flop on files 8, 13, 
14 then the 10 seconds are not to be added and r? sl = 1 (rather than = 0. 75). 
This is also shown in table 7. 

The transport volumes of table 7 are very small. If we recall again that the 
elementary files are likely to be consolidated with other files so that ten times 
larger volumes occur and that files are likely to be transported several times 
we may expect realistic times to be about 40 times larger. For instance for 
file ”8” we would then come up to 1 200 seconds if we use no flip-flip and no 
simultaneous read-write. ^ 


1) Example continued in 27.4-3. 
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r r-=r-JSssar 

rrrH^ r= ^“^ ~-s= 

oaLTcf a B efor^b C n r Par K 8ucces8ive results with those obtained at 
° Places Before that can be done we must, of course, compensate for dif- 

fermices in file sizes. Thus, if we compare with another system having twice 
the transport size on all flies, (assuming for instance double me TzZ j^Z 

dn e comparis r on aP H d8) *" multiply by two befor ® we can reasonably 
comphcated tVe “fTf “ m0st , Cases we will find that the situation is more 
' ! mayl5,1 instance that some fUes have greater transport 

How ttjfrf 3y em W ^ le 0ther files have Skater transport sizes in the otter 
How then do we normalize the results of the different systems to com ' 

rxsonreasonabiy meaningful? And what do we do in ojer to be able to compare 

difWoT 6 SOluti ° 11 for our own system with different hardware leading to a 
different measure of file volumes ? g ro a 

We want a measure which gives the same value if all files are increased in the 

Zl!r r and 8ame 8tructure of design is used. But we also want 

files wMchtottfd OW ’ rttb S" t the ^ file 8lze we tocrea8e 8Ucb 

t™?.™ f ? ' d lgn U8ed are transported several times, that is, if file 

reflectttjfnfluen Tt^f ^ ****** ^ ^ 8Um > 016 measure shall 
reflect the influence of this change of system transport. We obtain such a mea- 

voTume W<5 thG total tranSP ° rt ’ * • v -ddividritbythefil“sport 

^*“1* h 7 e ™ r ‘ he “ easure to be increased if we increase the volume by 
tom fifth !° !’ tostsnce. This is because one of the important prob- 

i ki e , SySt6m design will be whether to allocate memory space to permit 
tonger blocks or ratter to use this space, for instance, to rSuc e mulUpfici> 
of file transports. As we want to be able to check the design results by our ^ 
measure we want it to reflect volume changes which are not caused by^hange 
m transport size. We see that if we define g 

1 the transport factor bv 


total file tip size 1 ,s 

Where 1 is a vector having all element values = 1 so that the scalar pro- 
duct 1. s equals the sum of all values of s. 
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Let the file transport size vector be 


trps-10 


Total transport size = 24 


This includes time for setup, dismount and rewind. We have recorded both 
the file a and its updated version a'. This we can do at the outset because we 
know that a is a file of the standing file type, and is updated once per reference 

period. 

We may well come to make a design wherein a will be updated several times, 
passing intermediate stages such as a", a"'... This we do not know at the 
outset and we do not yet ask for this kind of information. Kwill.be taken 
care of by the transport multiplicity vector. Therefore a , a etc. are 
not counted in the "size". 

To determine the transport we make the assumptions for this example: 
mode of storage: 6-bit characters 

record lengths : (file a, a) = one record gap, the other files - 

1 record gap. 

U 

numeric data only 
no record blocking 


Note that in ADB systems where files are not copied during updating 
then also the updated version a' of a, should not be counted m the total 
size. Also, if random access is available to files then at most part 
of unused (unchanged) file data are affected by the data transport, to such 
cases one may come to find that only input and output data should contn- 
bute to the total size. 
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The storing of numeric data in 6 bit characters increases all file volumes 
by the factor 6 = 1. 5, a is further increased, by factor 2, for gap space, 

3 

and the other files get a factor 3 for the same reason. 

The file transport volume vector becomes: 

T a b c d e a' 

v = 30 4.5 4. 5 4.5 4.5 30 

Total file transport volume =78. (to be compared with the size = 24). 

Now that we have a lay-out of the information processing network we can de¬ 
fine the transport multiplicity vector. 

We count all intermediate transport for a (i. e. a.") 
under a*. 

ji T = 3 1 1 1 1 1 

The resulting file transport becomes 

trp = fj, T v = 3 • 30 + 4 . 4. 5 + 1 * 30 = 138 

and the transport factor V = trp _ 138 „ 5 6 

trps 24 


2 Example. 

What will the transport factor be if in the example above we chose 

a. the processing period twice as long 

b. another application where all files have double size 

c. another solution where numeric information in standing files are 
stored packed in 24 bit binary words, averaging 4 bits per decimal 
digits (rather than 6 bits) and all standing file data are numeric 
(assuming same number of bits per second transport speed and 
which is the percentage of change in total transport volume and 

in v ? 


If we double the processing period but keep the same reference period, 
the only change in file transport sizes will be that standing file mul¬ 
tiplying figures are cut to half their original values. 

T a b c d e a' 

V - 1.5 1 1 1 1 O ' 5 

Hence trp = p T . v = 1. 5 • 30 + 4 • 4. 5 + 0. 5 • 30 = 45+18+15 = 78 
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v = -SB- = li 

tips 24 


3.4 


Relative reduction of 77 

" " " tip 


(5.6 - 3.4)/ 3.4 = 65 % 
(138 - 78)/ 78 = 76 % 


b. 


When file size is doubled and nothing else is changed we would 
the double of file transport and the same transport factor 


However when the file sizes are increased the relative influence of 
set-up and take-down time may be reduced. This would then make the 
tile transport volume less than twice the other one when file size is 
twice the other. 


c. 


When the file data are packed in 24 bit words with 4 bits per digit on 
the average the file volumes go down by a factor of 4 / 6 . 


When numeric data are stored with 4 bits per digit this means in our 
example that the only increase in file volumes (as compared with file 
size) is due to the record gaps. The relative influence of these gaps 
however, will now be greater if we still do not use record blocking. ’ 

^ «//f* &le l had a gap equal to the re cord length, it now becomes equal 
/4 times the record length. Thus a grows with the factor 2 . 5 this 
time, to a volume of 25. The other files get the factor 3 • 1. 5 = 4 . 5 for 
the gap influence and are reduced in size by the factor 4/6 or bv * 

4, 5 • 4/6 = 3 in total. We thus get 


v 

and n 
trp 


a b 

25 3 

3 l 

T 

ix . v 

trp 

tips 


c d e a 1 
3 3 3 25 

1111 

75 + 12 + 25 = 


112 

24 


4.7 


; total volume = 62 


112 


Thus while the total volume is reduced in the relation 62/78 = 0 8 when 
only 4 bits per digit are used 7 ? is reduced by 4. 7/5 .6 = 0. 84, ’ 


3 Example. 

(Continued from 27. 3-16) 


Transport factor for_the.simplified toformation_s i stem._ 

Mfl 3P °,r fe0tor measures such ef fects as volume increase and speed reduc¬ 
tion for hardware reasons as well as the multiple file tranports. 
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^^•-rr^j^r^ss.-c-s?' 

srr„r£HFr“ : ^^^-“’.^.r.^ 11 ' 

nen we also have to add sorting processes for the Input files, 
tion after it has been shown that this is actually feasible and efficient Manv P 

ansport volumes than the corresponding punched files. 

An analogous discussion holds for the output files. 

As a consequence we should now modify the precedence matrix P bv adding the 

Onewnmrafso^re! 
number^* 80rted Version of a flle this to be transported a 
. „ ® * tlmes whlch we ma y represent tor instance by indicating a correspond 

mg precedence magnitude in the matrix P. This number will depend 0^3 

iso’a^ 6 8lZe and the SOrtlng P roce <lure and we must then, of course 

period J rt ^ number of sor ting operations that will be done in a reference 

of sit' g , 6r Pr ° cessmg <° r updating) period is used, a smaller number 

sorted has^he^TbeTn "^n C °™ e during the reference period, but the file to be 
it win hi h collected from transactions during a longer period Hence 

It will be larger and the sorting process will require more transports As thic 

“e double 0nal 40 1082 (me SlZ6) We flnd that > ™ughl y "peatog each 
in the sorting proSfr^^ P— 

hmmary analysis. Therefore we assume in this example that average mcMra^' 
port for sorting during a reference period varies in inverse proportion to the 
updating period (as is also the case for updating of files). 

mepTssef ^ r e 4 t e a88Umpti0n that evei, y sorti “g operation Is done in 20 
file passes If, for instance, l 8 is the tape version of the input file 1 0 and 1„" 

reatingl jrie 0 ;: 1 ^ * 0806 from the proceL 

creatmg it, and then has to be input 20 times to the sorting. This being a crude 

££" “SS “ '• '• “ 

si: sr:,'r'iis ,sr,r^: 8 b “; 8 rss.:: sri 

« n~4 (which is n, os t commonl but iE only slightly modified for other vah'ee" 1 * 
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To compute the transport volume of the auxiliary files we make use of the sizes 
(in bits) of these files as given in tables 1 and 2 of section 27. 2. We thereby as¬ 
sume that although these files are small, they are to be transported 20 times, 
so we use long blocks, corresponding to a volume factor of 1. 5 • 1.4 2 (as 

in 27. 3, table 6). As the calculation of the effect of sorting is done in a cru e 
way here, we simplify further by simply reducing the transport speed by not 
only multiplying the nominal speed (of 400 000 bits per second) by the speed 
factor (0 75), as in 27. 3-16, but then also dividing it by the volume factor. The 
resulting effective speed to use in the analysis will thus be 150 000 bits per se¬ 
cond. Using this figure and the file sizes of tables 1 and 2 of section 27.2 we 
cr e t the transport volumes for the auxiliary files as represented in table 1. 


It should be noted that sorting on tapes usually requires sets up of 4 or more 
tapes and therefore our neglecting this time here may introduce a significant 
nrrnr and in an analysis for an actual system one should be more careful on 


this point. 



proc. period 
ref. period 


Proc. period = 
5 ref. periods 

H a 


12.5 

0.05 

0.6 

0.003 

100 


0.47 

9,4 

1.8 

0.1 

0.3. 

0.1 

0.007 

0 

0 

0.4 

0.4 

0.4 

0.01 

0.2 

0 

0.4 

0.4 

0.4 

0. 01 

0.2 

0 

61.5 1 

61. 5 

61. 5 

0.72 

14.4 

2.8 

77 

77 

77 

0.2 

4 

0.8 

0.4 

0.4 

0.4 

0.01 

0.2 

0 


3.61 

364 


5 1 ) 1 ...I _I-^-1-'- 1 55- S64 - 

'otal system transport 

'| 0 FUe 1 5 l^notRepresented in P. It was introduced in 27. 2. -5. We assume it 
* ^ +~™<™rted once only, in our small subsystem. 


Hence: 


Total file transport size for this subsystem = 95 

We obtain the transport factor 

V trp: 

„ 808 

^ trp 95 = 8 ' 5 

for daily processing 

n - 364 

^ trp "95 ~ 3 * 8 

for weekly processing 

4 Excise What is the value for 

V. if 
trp 

a ) we use flip-flop 



b) we use simultaneous input-output 


-ttemarK. 


tor ^ asIiriaTer ta ^tera2rr is tw' le r 1Ue ° f ^ fac- 

normai m actual systems and why especiallv for f! ^ ° 0Uld be ex P eoted to be 
are probable. First, the transoort fn ^ lly for future systems higher values 
tlon and may therefore te Slu “ eaSUre8 effe ° tS of system tatcgra- 
tem only. One should therefoTe ^pLlT^lTb^ 0Ver a ^ bs ^- 

subsystem with an estimate of the numhe a- er mination of f/trp over a 
"across the intermediated **1 tra “ 8ported 
transported to processes in otherparts Jf'thetotf, f ubsystem "- e - 8. are 
mventory status file "8" was fmm/t u * total s y stem - F °r instance the 
7,1) and this was established fTthestT ‘ transport equal to 

only a subsystem of the system sh^Tfig 1 ^ f’ Se ° tl0b 25 - 1 whloh 

its turn, is only a small subsystem -if „ g ,’ ’ seetlott 22 - 4 - This system, in 

however, is it obvious that the inventory^tatuTfaTfm^vdT^ tWS subsystem > 

more processes for it will, at least he ^ TV (8> 11 have to be fed into 

This file ,s also most likety to have’uses t the 8ched ^ of ihe shop, 
used in some financial analysis. outside this subsystem for it will be 

moTmuWpTe^4“?“ “elf * 

inventoiy control system which!’, 3tudies have revealed that an 

into consideration is Z tolead^ the various items 
cedures, in fact any procedure which fto£mtly lm P ro ved economy. Such pro- 
parts, will need more complex handlinP' 1I fV? r V nteraCti0nS between system 
a higher degree of XT* 1 *' ^ imply 

are on sequential type o r on direct-access ^'storage™ “ Whe * her fUes 

i) see table 1, example 3, 
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we have done so far, is no longer sufficient. The reason for this is that the 
work to be paid is normally performed in a time sequence of well defined single 
jobs whereas the pay is to be computed per week, which normally implies sum¬ 
ming the pay for different jobs or alternatively of summing the hours worked 
during the weekon each of the different work types. 

6 We see that we need to have some information for each job done, so that we know 

to which week dees the job belong 
to which type " " 11 " 

which number of hours 

"type" here tells both if the job is on incentive (piecework) of on hourly pay and 
in the latter case which type of hourly pay. 

7 In the case that the work type is determined by day of week and hour of day we 
have the alternative 

indication whether hourly pay or incentive (piecework) 
start of job, week, day, hour etc. 
finish of job, " '» « " 


8 The fact that we have two alternatives here brings us into a decision situation. 

The decision can only be reached by considering questions which are for other 
people to decide - or for us to decide in another context. 

Both these cases presents an illustration of a point in the systems work where 
documentation is necessary: which decision is taken, why, by whom? 

It is important for the systems work to handle thoroughly and carefully all such 
decision documentation. 

8 a Remark. The decision situation mentioned in 8 apparently most often is regarded 
as being rather a "knowledge situation" where knowledge of the situation presently 
in use in the system (the company, for instance) indicates the choice to be made. 
This is not a very lucky way of handling the problem however for it is entirely 
possible that the present situation is not the best one. Only the explicit presen¬ 
tation of each "point of choice" in the system design as a decision problem for the 
correct set of people to handle is therefore satisfactory Thus rather than trying 
to find out what is in use to-day the problem is to find out what should be in use 
The systems analysis therefore is to be concerned about clear presentation of 
all decision points by careful analysis and documentation and finding out what in¬ 
formation is required for decision - again an analysis of information precedence 
relations. This, then, should indicate which people or groups are involved. The 
present way of emphasizing the explorative part of the system analysis wo rk 
therefore seems to rather misinterpret the problem. 

8 b A consequence of the arguments of 8a is that "company knowledge" should not 
be given too much importance but be replaced be company oriented decision 
operations. The man who "knows the company" may, more easily than other 
people, overlook the decision points. Company knowledge will of course, be 
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Use of_transport_factor_statisti cs 

If we have statistical data for transport factors these can be expected to be a 
function of memory size and the number of l/0-units for each value of the to¬ 
pological transport factor. As the topological transport factor would be the sa¬ 
me for the same kind of system we may have statistics for its value for diffe¬ 
rent classes of systems. 

Then knowing the file volume (which may also be a function of the file size which 
is statistically evaluated) we would directly obtain the total transport time for 
an application when the processing period and memory size are determined. 
This would permit us to find the optimum by varying design factors. 


292 


6. Grouping of computations into one process . 


We have seen that the basic topology of the system may call for multiple inputs 
or outputs of an elementary file because it is a precedent of more than one com¬ 
putation. It follows that file transport could be saved by grouping a set of compu¬ 
tation together in such a way that in connection with one and the same process of 
reading an elementary file we use the file data for more than one computation. 

We shall then say that we have grouped computations into one process . 

We study now how grouping computations can be used to save file transport in 
the system having a basic topology defined by a precedence matrix 

It is worth noting that file transport is saved by grouping of processes for every 
kind of mass memory used. If we use batch processing with magnetic tapes or 
direct access storage we save one or more runs and if we use direct processing 
(on line real time) we still save some access operations for file records (see 
examples given earlier). *) 

If we indicate also assumed file transport volumes for a reference period for 
each file, along with P 00 , it will be possible to see how much transport is 
saved by taking together two or more computations into the same process. (A 
practical instance of such a grouping is when we take two different transaction 
types as input to the same run.) 

We introduce also a multiplicity of 5 input scans in our example (just to illustra¬ 
te the possibility) for the file c when used as input to the computation for d. 

For simplicity we also assume all processing to be done once in the reference 
period. 

File Volumes 

1 abcdefghijkl 

1 1 2 1 10 2 1 1 10 20 2 10 

Total = 61. for the input volume 

Considering also the outputs e", i ', y and 1" we get 111. for the total file 
volume 


1) 24.1-18 and 24.1-19 
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a 

e' 

b 

i " 

c 

j" 

d 

r 

^00 

P = a 



1 

1 



1 

i 

c 







5 

i 

e 

1 

1 

1 

1 



1 

i 

f 

1 

1 
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1 

1 


1 

1 




h 

1 






1 


i 



1 

1 

1 

1 



j 





1 

1 



k 





1 

1 



1 







1 

i 


V 


1 


2 


10 


2 


1 


1 


10 


20 


2 


10 


input 


The system as described by 2 may be called the basic topological system . It 
should be defined for a chosen reference period. For simplification we have 
taken most entries in P 00 to be = 1. Notice that they will change when we change 
the updating frequencies. 

3 We obtain from P the total transports of each input file in the topological system. 
For instance for c we get (5+1)2 = 12. 

For the input multiplicity we obtain from 1 T • P*t ' V input) = ( p0 9 1 )^ v input 


File ace 

f 

g 

h i j 

k 

1 

Input "'j 

Transport) 4 12 60 

Volume J 

4 

3 

2 40 40 

4 

20 Total = 189 


6 Total output transport volume = 55 for (a, b, c, d, e' t i', j', 1'), thus total 
transport = 244. (189 for input + 55 for output if all output is single). 


7 We see that the topology of the system of precedence relations requires an 
increase of the transport value as compared with the volume of the files. We 
measure this by the topological transport factor. 1) 

1) See 25. 5-3 
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Hence we find that in our example the computations system implies a transport 
volume for input of 189 against the volume 59 (= 61-b-d) for single input. We 
here make the simplifying but unrealistic assumption that the file transport 
sizes equal the file transport volumes. This gives a topological transport factor 
of about 3. 3. 

8 The multiple transport can be reduced by taking some computations into the sa¬ 
me computer process. For this to be possible the computer memory must be 
sufficient to store the programs for these computations and for their coordina¬ 
tion into one process, or else there will be data transport for shuffling programs 
to and from the file storage when different records call for one or the other - 

or both - computations. 

9 Grouping computations corresponds to grouping columns of P 00 . it is easy to 
see from P how much transport is saved by grouping any computation with 
any one else. Thus from 2 we find that grouping a and e' saves one scan of e, 
f and g, i. e. it saves a transport volume = 13. 

10 No other computation saves as much when grouped with a. ■*■) We list all the 
pairs which save most by being grouped. 

a, e saves 13 

b, i " 21 

c, j " 32 

d, 1 » 23 

Total transport saved by the groupings = 89 

11 The other possible combinations, such as e, b do not have to be considered 
since whenever P(e) c P<a), then Vol(P<e) n P(b) ) < Vol(P(a) o P(b) ). 

(Vol{x) is used to denote the transport volume of x) 

That this is so follows from 

12 P(e) c P(a) is equivalent to P(a) = P<e) ^ s (s = some set ^ 0) 
hence 

33 P(a) n P(b) = (P(e) 0 s) n P(b) = P(e) ^ P(b) w s P(b) 
and therefore 

14 Vol (P(a) rx P(b) ) = Vol (P(e) ^ P(b) ) + Vol (s P(b) ) 

As Vol (s o P(b) ) cannot be negative the truth of the statement follows. 


1) It is important to note that it does not follow, in general, that the best solu¬ 
tion is obtained by grouping e with a. Cf. example 27. 
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However we have found earlierthat we have the order relations 

15 (a,e) < (b,i') < (c.D < (d,l‘) 

It follows that (a, e") < (d,l') and (a,e') ^ (d.l') or in words: (a,e') is a 
precedent but not an immediate precedent of (d, 1"). This means that among the 
inputs to (d, 1") are some information sets which are produced by means of 
output from a process following the process (a, e'). This, in turn, means that 
before (d, l ') can work it must have available results from a process which in 
its turn can only run after pr(a, e") has produced its results. 

We can state this result by the 

16 Restriction rule for Grouping: Processes cannot be grouped together if they 
have precedence relations of more than l;st ^rder. 

As processes which are not neighbours" 1 2 3 ^ cannot profit from being grouped we 
find that candidates for grouping are only such processes which are "strict" 
neighbours. This reduces the grouping problem very much. However a process 
which is not a "strict" neighbour may become so during the process of grouping. 

Instead (a, e') « (b, i") because it follows from 18 that there is no precedent 
of pr(b, i') which succeeds pr(a, e'), The fact that (a, e') <& (b, i') or that 
(a, e") is an immediate precedent of (b, i') means that these processes can 
be grouped because in this case the output from pr(a, e') is available to 
pr(b, i') when they are in the same group. 

Thus feasible solutions of the problem how further to group the pairs studi ed 
are in this case (provided there is enough memory space) 

either 


17 

pr(a, 

e') 

v pr(b, 

n, 

saving 

10 


pr(c, 

n 

u pr(d, 

n, 

n 

0 

or 







18 

pr(a, 

o 



saving 

0 


pr(b, 

n 

o pr(c, 

D» 

ii 

10 


pr(d, 

i') 


» 

T! 

0 


19 Thus we can only come down, by grouping by pairs of the earlier pairs, to 
a transport = 90. 

20 Thus with this grouping (which happens to correspond to grouping by the subsets) 
defined earlier^ the total transport goes down to 100 corresponding to the much 
improved transport factor of 100/59 = 1, 6 (as compared with the topological 
transport factor of 3. 3). 


1) See 25. l-14b. 

2) Cf section 12.2 for the definition of these concepts, 

3) See 25.1.14b 
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21 


22 


The grouping of computation may call for shorter data blocks in the files in 
order to leave memory space for the group programs TMs Jl then toLl 
an increase of transport volume which will also have to be considered!^ 

ramhm S< ? ena f 86 ° f praetical systems one will also have to test for different 
ments forTe S5S.2K * iTSLST 311,141,6 ^ r6qUire - 

We have seen that the subset relations (such as P(e) o P(a) give valuihto »,-h 


23 Example . Let P 00 = a 

b 



and suppose all 
file volumes = 1 


The transport saving that can be obtained through grouping some process with 

a is greatest for the pair a u b for which it is ^ m-. , , . 

then wmcn n 13 3 - The total saving possible is 

3 for a ^ b P lus 0 for c W d, total saving = 3 
and is thus better. 2 for a w d plus 2 for b w c, gives a saving = 4 

We have discussed in a superficial way the possible methods for efficient oro- 

SKSt nf ““ f ° rmUlati0n ° ftheae P^erns is skeTched in 
chapter 214 bot no numerical procedures for solution are given so far. 

24 Sf' i! 13 ne ° eSSarj ' 40 remember that when information is represented bv 

Wh ^V he reCOrdS COme “ the «le is Of im“e y 
for the information that can be drawn off the file. The sequence - or the 

TO iWm th° a file ma L haVe to depend on the P r °cess or on the other files in- 
lved m the process. Two processes which require the file to be in different 
ordering may therefore not be grouped (unless iTsome specif c^ses) 

d^inged! SeqUenCe * When a ^ Ch3ngeS itS ° rdering tt sha11 ^ have its name 


1 ) of section 28.3 
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7. 


Incidence matrix of process and transput 


We have seen that the precedence relations and the precedence matrix P are 
suitable to describe some of the relations of our study. However, some facts 
could not be described in that way. Thus the number of file passes In a process 
could not be specified in P 00 for both input and output files. Further P does 
not offer a convenient description of how computations are grouped in to composite 
processes. Both these deficiencies are remedied if we introduce the concept of 
incidence between process and input and output, hi this way the precedence matrix 
is replaced by the incidence matrix E 10 . The incidence concept considers not on¬ 
ly precedence relations among data sets but shows how any process takes some 
data sets as precedents (or input) and other data sets as succedents (or output). 

In the incidence matrix E 10 one row is taken for each process (or computation) 
and one column for each data set. The number of times that a file is treated by 
a process is used as the incidence number between the file and the process. In¬ 
cidence numbers are given different signs for input and output. Here we use 
minus-sign to indicate output, 2 ) 

1 To start with a simple illustration let us take the computations 

comp (a) : P(a) = e, f, g, h; 
comp (e) : P(e') = e, f, g; 

illustrated by the first two columns in P 00 ^ and write the corresponding in¬ 
cidence matrix E 10 

2 abode e'fghii'jjkll 


E 10 = comp (a) 


comp(e') 


3 Note that we have now written numbers for both input and output so that by 
replacing the units by any appropriate number we are in E able to indicate 
any number of input and output scans. 

4 Note further that in E 10 , as well as in P 00 , we can easily see the fact that 
P(e') c. p(a) and also that consolidating for instance g and h would cause a 
deadweight transport of h in comp(e ). 

5 Also, a row in E 10 indicates clearly the number of input and output sets for 
the associated process. 

6 Now suppose that we decide to group comp (a) and comp(e' ) into °ne process 
pr(a, e"). This decision could have been made on inspection of P or E . 

It can be performed by replacing the rows comp(a) and comp(e') by one single 
row "pr(a, e')": 


-1 1111 
| 1-111 


1 ) 

2 ) 


se 25.1-32 

otice that this sign-convention is opposite to the more common one used in chapter 
! p. 99. We make this change here simply because outputs are usually fewer than 
puts and thus we m inimize the number of minus-signs to write m the matrices. 
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E 


10 

= pr(a, 


e ) 


abode e'fghii'jj'kll/ 


1-1111 


7 We may also introduce a change in the graphical representation which is ana¬ 
logous to the change in passing from P 00 to E 10 . Thus we introduce also in 
the graph a representation of the process (or computation): 



We have used a rectangular element (or a line element) to represent the com¬ 
putations or processes, whereas the data sets are represented as points. This 
is because processes have two ends, the input and the output end, like a line 
has. Thus data sets are represented as O-dimensional entities and processes 
as 1-dimensional ones. It was therefore that P 00 , being a relation between 
O-dimensional entities, has got two superscripts of 0 whereas in E 10 the first 
superscript (the one associated with the rows of the matrix) is 1 to indicate that 
the matrix rows are associated with the 1-dimensional objects called processes. 

When the two computations are grouped into pr(a, e') this is illustrated by fig. 1 
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Now we can write down the incidence matrix corresponding to P of 
25-32. Thereby we assume that the output set d in pr(d, 1") has to be scanned 
the same number of times as its input c, i. e. five times. This fact could not 
have been indicated in P° . We make the assumption in writing down E 1 , that 
we have decided to group all computations, which have input sets being subsets 
of input sets of other computations, to these latter computations. Thus pr(e') 
is grouped to pr(a), givingpr(a, e') and similarly we obtain pr(b, i'), pr(c, j') 
and pr(d, 1'), 


8 E 10 = pr(a, e') 


pr(b, i') 

P*(c, D 

pr(d, 1') 

File vol. 1 10 1 10 2 20 1 10 10 2 1 1 10 20 2 10 



Total file volume =111 
Note: 

In 6 we have indicated e' as input instead of e in all processes except pr(a, e"), 
as it should be, (likewise for i', j ', 1'). 

In 6 we have also added a row (the last one) to indicate the assumed file volumes. 
In fig. 3 we have drawn the graph corresponding to E 10 of 6. 



Exercise. 

Show that the transport factors rj and rj top 
can be computed by - , /e 10 / . v 


(cf sections 27. 4-1 and 27. 5 -4) 

(v= file volume vector 
s= file size vector) 


and 


^top 


= 1 T -/E 10 / 
l T -s 


(/E °/= matrix obtained from E 
by replacing all minus 
signs by plus signs.) 
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1 . 


Effect of process grouping op the tran sport factor^ 


Grouping of two processes will reduce the transput volume if the two processes 
have some data files in common, as input or output. 

It is directly seen on scanning the incidence matrix E 10 for the file system 
where files are used in several processes for then the columns in E will 
contain more than one nonzero entry. 

When some files are associated with several processes then of course more than 
two processes will be considered within the same grouping. This will generally 
make a large number of combinations possible and the scanning of all of them 
may be too much even for a computer. In practical cases it is expected however 
that different kinds of restrictions will reduce this complication considerably. 

One typical restriction is memory space. 

One may often be interested in estimating the effect of marginal increases of 
memory size. In the first steps of such a procedure only a few combinations 
will be possible so that then a complete analysis seems quite feasible. 

One might for instance determine the smallest marginal memory increase 
which will enable any process grouping. For this one may then compute the best 
total set of groupings in the system which it will enable. From this a change in 
transport volume will be computed. As a second step one may then compute the 
possible reduction of transport volume which would instead be obtained if the 
increased memory were used to reduce file volume through using longer tape 
blocks (or through more data packing). There is of course also the alternative 
of using memory space to speed up internal processing. This might of course 
save more time. We disregard this possibility here because we have chosen to 
study the data transport only. In a realistic situation some consideration must 
be given to this, of course. This procedure may then be repeated for a second 
step of marginal memory incrementing. 

It is of course possible that, in a given file processing system design, an im¬ 
provement of the transport factor can be obtained without incrementing memory. 
If this is so, it is however because earlier steps in determining optimum process 
grouping have not been completed or were not correctly performed. 

Just to make life more complicated we have also to remember that there is one 
more possibility for exchanging memory space for time. This is when the com¬ 
puter permits simultaneous writing and reading on tapes. To make use of this 
we need double memory areas for both reading and writing (if we want to avoid 
complications). To see when this possibility should not be used let us assume the 
record length is x x g, g = the record gap. If simultaneous ,read/write is 
not used, space is saved which would permit using double record length. (This 
again could well be impossible because of the extra program space called for by 
blocking records. But for the moment this is disregarded.) 

The length on tape allocated to one record is 


1 ) See also 27, 6-19 
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g + x X g = (1 + x)g 


If instead we use double length records we get (for the length of one physical block) 
g + 2x x g = (1 + 2x) g 

Which means that a single (logical) record takes (~ + x) g of tape length in total. 
We see that only for x = 0, i. e. for zero record length the tape length per record 
goes down to one half, thus doubling effective tape speed. Whenever x is greater 
thanO then simultaneous read-write is better than blocking of records (as it will 
always lead to roughly doubled effective tape speed, regardless of the value of x). 

On the other hand omitting simultaneous read/write may in some cases save more 
man two file passages by leaving space for process grouping, in which case no si¬ 
multaneous read/write should be used. 

Obviously grouping must be by more than two, i. e. by three at least, in order 
to be better than simultaneous I/O. 

Simultaneous I/O requires two extra record areas for each file pair. Grouping 
by three requires three program spaces, that is 2 spaces extra. It is seen that 
if n I/O file pairs are involved, and the average record area is r and the average 
program area is p, then 

only when 
p < n • r 

will grouping be possible 
with less space than is 
called for by simultaneity 
of I/O (and be saving as 
much time) 
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JgggMyenient associated with pm^q 


croupinf 


the ^meflta t? It *° gr ° Up 3everal Presses associated with 

e same file, to save file scans, we need memory space. Thus each Droce*?* 

needs a program and a work area for the transaction item calling the process, 

™ U8 “ 8ever J a Presses are grouped into one it is necessary to have proeram 

aTion fteTcaiU *“ ™ 8 ls because °^erwise e?ch time aTrls 

11 Process the corresponding program would have to be trans- 

t memor f; Note h °wever that the data areas could be shared by all the 
different transaction items for these only have to be in one at a time 
we would not have the choice of not grouping which? el^yfie fh^ml p^t 
° f °” proble “>- » ^ rot be the case, however, that the programX? v ?tem 

a nt,° f tMs fa0t “ 13 aeea tb at <o save one file scTj grouptag 
process a with another »b" one would call for a transport of ’’program a” ^ 

J" e “ ory tor ea °h occurrence of an item "a" and then a transjort^f "pro¬ 
gram b to restore the memory after "a" has been handled. 

T . h “ s if ,‘ (a) ia ^ number of "groups of transaction" items, i. e. transactions 
h ? 6 fi, WMo11 foUow ea °h other densely so that no item of other type comes 
between them, and if prog (a) and prog (b) are the transport voltes for “ 

RS tot ° r m0Iy ^ 3 ^" a ") the program 

(^torthe group "a"), we have the transport volume for program 

the!? prog < a > + P ro §to)). If this transport is less than that of a file scan 
S’to'totog process "a" into the larger process will save transport volume 

whole 1 rtm f) rogram for " a " camot 136 ke P‘ continuously in memory during the 

r ■ “ vo " ,> “ - “*«». 


i(a) {prog(a) + prog(b)J < i(f) vol(f) 


i(a> < i(f) 


^prog(a) + prog(b) j- “ a P ractica l margin. 


Example. 

tr? r fii„ n vI maJ 3 3 1 i 3 !„°! a program for an elementary process is 4000 bits and 
for a file record 1000 bits. In that case the condition would be 

1(a) < 1<f) 8000 " margtn or J ( a ) < ^ - margin. 

to that case it would be of advantage to "group in" "a" if the "transaction fre- 
quency for a" would be less than 12 % - margin - for instance it may be 

crlminC r l ' 18 18 n0t to 100111111011 - One might therefore estimate that 

ggBBteSS Uoft en pay even when aU gro grams cannot be held simultaneously 


1) See example 24,1-18 
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As we need to have for instance a tape unit available for the programs of a 
group of processes, if the memory cannot hold them all, one must make sure 
that the gain is worth the cost. If in any case an auxiliary unit (a tape for in¬ 
stance) is used to hold the program library, there may be no cost. Then one 
may still find that if 2 is satisfied, but only with a small margin, the realistic 
options still will be only either to require enough memory or not to use grouping 
of ’’a” into the other process. 

If not only (a) but several programs, for instance also (c), (d)-are grouped in 

while stored on tape then each call for (a) may cause not only the transport = 
prog{a) but also one or more of prog(c), prog(d) ,.. may have to be scanned. 

How much this will be will depend on the statistical relations between the calls 

for a, b, ..that is it will depend on the stochastic occurrences of the 

transactions of types a, b, c, ... 

As it is common to have the total transaction frequency for even heavily grouped 
processes well below 100 %, and as it is realistic to expect something like the 
well-known 80-20 per cent rule to hold often, one will have to expect that about 
80 % of the transactions are of a kind which occurs only about 20 % of the time. 
This suggests that some of the transactions come less than 10 % of the total 
transaction frequency. If total transaction frequency is less than 30 % (a very 
common situation) this means that some transactions should be expected to have 
a proper frequency of only about 3 %. 

The conclusion is that when accounting for memory requirements for possible 
grouping of processes one should always also list expected transaction frequen¬ 
cies, for one has to expect that in many cases this will indicate the requirement 
to be less severe than would otherwise be the impression. 

These observations also indicate that auxiliary memories like magnetic drums, 
and even tapes, may be of great interest in connection with process grouping 
so long as they are significantly less expensive than the central memory. This, 
on the other hand, is not always the case, for we may have to add an extra cost 
for attaching another type of equipment to a system. 

The grouping together of two or more processes naturally has some effects on 
the computer programs. If a set of different transactions are input to the grouper 
processes working on one master file (or several files) which they are all con¬ 
cerned with, then each time a transaction is read in the program has to analyze 
the type of the transaction. One of the transaction types may be associated with 
only one of the processes and the analyzing program on recognizing this trans¬ 
action record type will set up a "program switch" to the process in question. 
Another transaction type will need a switch to another process program while 
a third type may switch in both programs, one after the other. 
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3, Computer programs and memory space for a process. 


Knowing that the possibility of saving file transport by grouping processes de¬ 
pends on whether memory space is sufficient for the need of all processes in 
the group we come to the important question of how to determine the memory 
space needed by a process. This memory need is made up of several distinct 
components. •*•) 

0 Space for operative program system. 

1 Buffer areas for the data needed by the process. Some of these are in common 
with other processes in the group for otherwise there would be no reason for 
grouping. They do not then have to be counted when we test for possible inclu¬ 
sion into a group. Simultaneous I/O and processing calls for duplicated buffer 
areas. 

2 If the process uses I/O equipment not used by other members in the group, this 
may require some 1/0-routines to be added to the executive program subroutines. 

3 Memory space for the programs for the procedures realizing each of die pro¬ 
cesses which check data and results. 

4 Space for programs that move such data to the updated files, which are not 
processed by the procedure. 

5 Work areas for the procedure. These will often be shared by other procedures 
in the group. To the extent they are not or exceed the maximuni need of the rest 
of the group they are to be accounted for. 

6 The most difficult memory need to determine is that for the procedure proper. 
This problem has in fact plagued the programmers and computer users for more 
than a decade, with no general solution being produced. 

7 The proper thing to do to solve this problem, or at least to reduce it, seems to 
be to try to collect statistics. If the reader feels this could hardly lead to suf¬ 
ficient accuracy in estimating actual need he should recall that the only possi¬ 
bility he has to-day is much worse - he can only guess. The result is that today 
people do not tackle the problem at all and are deeply disappointed when they 
find their fine, integrated system can not be implemented because of memory 
limitation. Thus to-day even a systematic guessing, in combination with the 
grouping space analysis we propose here, is a real improvement. If statistical 
data could be used to even slightly improve our guess it is very desirable. 

8 Two things serve to make the use of statistical data for process space estima¬ 
tion seem promising to-day. One is that our technique of analyzing the informa¬ 
tion need down to elementary files and elementary processes, or computations, 
makes it much more feasible to find similarities with already implemented sys- 

1) It is worth noting that the optimum problem is simpler for very small or 
very large memories. In small memories alternatives are few, and in large 
one we can do anything of advantage. 
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terns, Hence the statistical data will be more relevant. 

The other reason is that the increasing use of problem-oriented programming 
languages such as Cobol and Algol makes it possible to obtain computer inde¬ 
pendent statistical data. If we have an estimate as to how many Algol statements, 
for instance, that are included in some implemented procedures corresponding 
to an elementary process then we may be able to make a good estimate of how 
many Algol statement we are going to have. We will then, as a rule, have usable 
estimates of how many machine instructions our specific compiler will produce 
from these Algol statements 


In fact the availability of modem efficient procedure languages also help in an¬ 
other way to solve our problem. Thus we will be able to write down a whole 
first draft of an Algol procedure for the process we want to estimate, as this 
can often be done in fairly short time which is not the case with machine-oriented 
programs. 


10 Also the space according to 4 is not so simple to determine. It depends on how 
many elementary files are consolidated into the master files. 1 ) Also the pro¬ 
gram for moving any specific elementary file (or data group) can, in principle 
at least, be used by any process which does not do computation work on that 
specific elementary file. Such programs therefore are common (or should be) 
to many of the elementary processes. 

11 If for instance the master file contains the elementary files A- (i=l, 2, 3,., k) 

and process p 2 uses A x and A 5 while process p 9 uses Ao, A d and Ac we need 
programs like. * * 4 b 


12 pr(p x ) ; 


move A 2 to new master; move A 3 to new master; 
move A 4 to new master; move Ag ...; move A^..; 


and 

13 pr(p 2 ) ; 


move A^ to new master; move A 2 to new master; 
move Ag to new master; move A^ to,..,; 


We see how several move programs are common to both processes. 


14 When different computations are grouped into one process then there will also 
have to be an administration program which co-ordinates the working of the 
different computations. In general the situation is that when one input transac¬ 
tion record is read in it will be of a type which calls for a specific computation, 
and that one alone. The co-ordinating program will sense the type and switch in 
the appropriate program which then performs the computation and the correspond¬ 
ing move operations. It is natural to expect that the co-ordinating administration 
program will have one part which is common for the group (we can call it 
’’adm (group)”) and in addition one part for each of the computations (such as 
”adm(l)” for the computation pi and ”adm( 2 )" for p 2 ). 


We can set up this (for the example above) in a tabular form, using one selection 
column for pi and one for p 2 . 


1) See chapter 9. 
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JEgfEji 


admferoupl 


Pi 

P 2 




adm(l) 


adm(2) 


mv(l) 


mv(2) 


mv(3 ) 


mv(4) 


mv(5) 


Tableau showing different programs involved in pi and p2 respectively. 


Thus if p2 is already in a grouped process the addition 
a memory requirement; 


of pi into the 


group adds 


adm(pl) + pr(pl) + mv(3) + mv(4) 


instead of only pr(pl) 

pr(pl) may already be known. (It may for instance exist as an Algol procedure.) 
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4. 


Example of process grouping with memory limitation,. 



255 

trp factor = — 


2,55 


1) 


We make the simplifying assumption that all record sizes are 100 memory po¬ 
sitions We also assume that a buffer area for one record is included in the 
"program space" given in the right-hand column of 1. Thus, as long as the 
blocking factor is = 1 we need not allocate additional memory for buffers. 


We find from the table 1 that we need a memory size of 2000 at least, for this 
is seen in row "j ,r of 1 to be required for process "j" alone. Now it follows that 

1) Comparing with p 283,1, and 278,14, we find that our result for "tip factor- 
should have been multiplied by » tv (the transport volume factor) which tells 
how much larger is the file volume than the file size. We ignore this refine¬ 
ment in this example as it is of no significance for the operations we want to 
illustrate here. 
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as process "i" only requires 1000 it is possible to group any process requiring 
1000 at most with pr(i) when 2000 positions of memory is used. We can for in¬ 
stance group ”f ,T with n i" which requires the memory space 1800. If we do so 
we save the operation of in-putting f'for pr(i). We still need to input ,, f Tt for 
pr(f) and to output "f'". We have thus saved one scan of V which, from 1 is 
seen to save a transport volume of 10. 2 ) 

We now find that any of the processes pr(a) through pr(e) if grouped in pairs 
will save one scan of both M f n and u i' ” , or 20. It is possible that we might be 
able to form two such pairs within the memory space of 2000. We try this, and 
find in fact that pr(a \j e) requires 2000 and pr(b o d) requires 1900 so both 
are feasible. 

We see that our reasoning in words is very difficult to follow up or review. 
Therefore while we are not able to present an algorithm for the analysis we 
show that it is very advantageous nevertheless to attempt at a concise presen¬ 
tation which gives us a chance to see in total what we are doing. This is true 
even if we only have such simple means as to use some shorthand notation and 
use a table to express our results, as they are produced. We give in 3 a table 
in the uppermost block (of 4 lines) of which the reader will find the decisions 
we have made so far and the resulting gain in saved transport. 

In the next block in 3 we try another solution which still uses the minimum 
2000 of memory space. This time we see if we can use the facility of simul¬ 
taneous 1/0, which we assume is available on the computer we plan to use. The 
reason we tried without using this facility was, of course, that we wanted to 
use the very minimum of memory space which does not permit simultaneous 
1/0 for pr(j). While we are thus still not able to use simultaneous 1/0 for pr(j) 
we now see if it can be used to advantage in the other processes. We cannot 
be sure, for the space required for alternate buffer areas by simultaneity 
may prevent us from using some grouping of processes. 

We find in the second block of 3 that in fact simultaneous 1/0 is efficient. The 
total saving is 60, bringing the transport factor down to 1. 95 (from 2. 55 at the 
outset). 1) In the uppermost block, where simultaneity was not used, we saved 
only 50, making the transport factor = 2. 05. 

We believe the succeeding steps can be read directly from 3 without need for 
further comments. 

2 Let m(x) = memory req. for pr. x 
m = memory available. 

the possibilities for blocking records are not considered. 


1) We use the assumption here that simultaneous 1/0 eliminates the trans¬ 
port for one of the simultaneous files, Cf however the footnote on 28.1.3 

2) Notice that if we group "f 11 and "i " we must assume that all processes a to e have 
been finished before processes "f "and "i" get started. Note also that the process 
"f " (updating file f by itself) is empty in most practical applications. We have 
added it here for theoretical completeness. 
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3 


Trp vol 
saved 


Trp 

factor 


2000 Minimum space for proposed system 
m(i uf) = 1800 
m(a u e) = 2000 
m(b \j d) - 1900 

Allocate alternative areas for f-f' 

Req s nigit = 200 block factor = 1, 

Saves trp. vol = 10 pr f-V run. Reduces 
sav. from grouping f-f" runs by 10. 

Req s malt = 200 also for i ^ x, if x any 
a-f. 

•, * m = 1800 * m(i f) = 1800 

m(a o' b) = 1800 
m(c wd) = 1400 

2500 = 2300 = m; m - m(i) = 1300 

m(i b) = 2300 
m(e v/ f) = 2300 
m(a w c ^ d) = 1900 

3000 m(i u j) = 3000 

m ~ m alt = 2800; m(b^e) = 2800 

m(a 2700 

Further grouping requires the elimination 
of one group above. This could be done by 
m(iwjwb) = 4300 

4400 m(a^c^d^fWe) + malts 4400 

To group further we need to increase to m= 
=4400+4300. However record blocking could 
be done with less increase of memory. Also, 
for this high degree of grouping file consolida¬ 
tion would save transport. 


10 

20 

20 


3.0+10 

10+10 

10+10 


10+10 

10+10 

20+10 

45 

10+10 

30+10 


55 

40+10 


2.05 


1. 95 


1.85 


1. 55 


1. 50 


4 Problem. 


C T e T nce ta pr0blem 1 of addta S a requirement of common space 
600 n J L,; b> c . (meaa i m S that grouping b to a does not call for an addition of 
2500 only & ° b ~ V01 ^ + Vo1 ^ + 600 )' Discu ss for memory = 2000 and 


5 Problem. 


Discuss what modification to problem 1 would be feasible 
1/20 of the frequency of the other items. 


if e occurs with only 
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6 Remark. In a practical system the number of elementary input files will be 
much greater than in this example. A fairly realistic number would be about 
ten times as many. On the other hand it is also normal for the program space 
requirement for each elementary process to be ten times smaller. Hence our 
example is fairly similar to a real situation where we have already done con¬ 
solidations so that a, b, c, ... can be considered as being consolidated from 
ten elementary files on the average. This first stage of consolidation may 
well have been done as guided by a set of commonly used subprocesses and 
thus commonly used memory areas 1 ). This first stage could also be done 
without regard for memory as it was to lead to program sizes still well below 
the planned minimum memory (or 2000 positions). 


1) 2 Cf. 28. 3. 3 (11, 12, 13). 
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Chapter 9. Transput equipment consideration. 







1. 


Jteduemg the number of t ransput Rrp.ipn.^t 
Jif t P :, e ° edenCe mat ™ P °° ° r the incidence matrix E 


10 


about the input and cutout < 0 r't^ST7“Z "T™ alao ^? a hiformation 
lumn for a LwTta P^« that ] re ^ ire “- L) Thus the co¬ 

input unite for I" “ g an d h Thuf toll° U<pUt ““ for a there is needed 
computation. If it is grouped with that fh *' 6 l? ltS would be needed for this 
as well thus increasin?^^ - be a tmit for e 

The group a, e, d, 1 similarly is seen to call for twelve transput units. 

fec^^son“llnTnh ‘ baa *> ba -all 

of the elementarv files ],„„„ aVe .j , e to Sether (or consolidate) several 

ECtfpS zzzsvzrs * *“* “ 

ST 1 S' a £r“" g ■ , "«* 

port Hence b “ d m f ellm “ at « *uch excessive trans- 

ficient effect than we have seen above™ ^ haV6 a StU1 greater bene “ 

^bll' 7 eT£t D a l?ZZ£ *“ * ** ^ 

to composite reco“ds. 3) “* ° r * C0I “S »>e basic records 

1 —s.‘rsrr srsr.'s?.: 


, 00 . 


r d 

0 0 
0 1 


weight! ImifarlTdurS ^ *7* * tranap0tt h as a dead¬ 
weight. Thus a deadweight transport of 2h + S ^- 3 i S ^ Jf^ 6 g aS a dead ' 

and e will eliminate one of these^trMsm»rts m d vie ’ ^ °* a 

eliminate also the excessive groupuig a » e with d, 1 will 

port caused by consolidating g'and h to 1 .’ ” U ° mg ^ ^ deadwei S ht tra *s- 

iSS?wLTeaTcofsoU^toof j ^ ^ “° dead - 

°fj_ando neofj-, so tha t in that case the resulting'Sweightteans^rtl^d 

1) E 10 as given in 27. 3-6 

2) number^ of trmspu^unito 1 ^ “ d h “ Ce “ g *• 

files are sufficiently small ^ f 8 UP ta “ e that wlU result U the 

3) See 212. 
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te 40 Grouping of b, i, o, j on the other hand would enable i, j, k to be con- 
solidated without any deadweight transport at all. This is easily seen from P . 

When we consider consolidation of files, the picture of excessive data transport 
becomes somewhat complicated, being composed of duplicated input of: som 
files to several processes, and of deadweight transport of some data m these 
files which are not utilized as input in some of the processes to which they are 
fed together with the true input data d for those processes. 

However, the total excessive transport is very easily detected, being simply 
the number of passes of that file minus one. Thus we need not keep record o 
the different kinds of excessive transport. 

Consolidation of files, even to the extent that one file record “ VCT3 seV * ra ^ 
blocks on a direct access memory, may reduce “cess time because neighbour 
ing blocks mav be accessed much quicker than random blocks (pseudo-direct 
access). Whether this is the case or not will depend on the circumstances an 
cannot be decided in advance. 

Likewise if two files are small enough to be carried on one tape reel, to con¬ 
solidate them - but also to put them after each other - may save transport y 
eliminating a need for tape reel set-up. 


Consolidation of files. 

We talk of consolidation of files when we take two or more files toother on 
the same file medium, e. g. mass memory, and thereby arrange the individual 
lecordTta tL sorting seance specified for the file. We notice that vm can 
put some files on the same magnetic tape reel without havmg a consohda ed 
file for if we put first the whole of one file and then another file these files 
are’not consolidated according to our definition. We shall then say that such 
files are concatenated . 

In consolidating files we may retain the records of the individual files as sepa¬ 
rate records of different types. In this case we must add to each record a date 
term can-vine the tvne code. We call this arrangement: consolidation by re c o r_ 
colILtion This is thtTway - punched card files of transactions are commonly 
consolidated. Alternatively we may consolidate files by taking tieir in m 
records together to form consolidat ed records. This is the common way 
consolidate standing files. 

To illustrate the different ways of consolidating files let us suppose 
a file "a" such that each record of "a" contains a name " 1 ” (or key) and a set oi 
ofcer terms ^properties) ' V - The record number i thus is (i, a,) with a comma 
used to separate the name. Let the file "a" be: 

(a) (1, a x ) (2, a 2 ) (7, a ? ) (8, a g ) (13, a 13 ) 

where (a) is indicating a file label identifying the file to be of "the kind a". Noti- 
ce that a-j etc. does not contain the label "a". 
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Let another file "b" be. 


(b) (1, bj) (2, b 2 ) (3, b 3 ) (8, b g ) (9, b g ) 

Let a b denote the file obtained through consolidation by record collection . 

Then: 

a b =* (a b) (a, 1, a 1 ) (b, 1, b^ (a, 2, a 2 ) (b, 2, b £ ) (b, 3, b 3 ) (a, 7, a ? ) 

(a, 8, ag) (b, 8, bg) (b, 9, b^) (a, 13, a^g) 

Notice that we had to insert the "type codes" "a" or "b" respectively into each 
individual record. Thus the data size of a b is greater than the sum of the sizes 
of a and b. 

Now, let instead "avb" denote the file obtained by consolidation by record con¬ 
solidation; 

a ^b = (a^b) (1, a ±i b^ (2, a 2 , b g ) (3, bg) (7, a ? , -) (8, a g , b g ) (9, b g ) 

(13, a 13 , -) 

The hyphen between 3, and bg for instance denotes an empty space corresponding 
to the size of a 3 (which is missing). 

We see that this time we save one occurrence of the record name, in each pair 
a, b, of records. For instance in a b we had the name "1" with both a^ and b^. 
In a^b it is occurring only once with a^ and b-^ together. We also save the type 
code because in aub we know where the a- terms and the b- terms are placed. 
Instead we see that file storage space is wasted on the missing subrecords (such 
as ag or b ? ) indicated by the hyphens. Instead of leaving empty space for missing 
subrecords we may use a type code to indicate what is missing. For instance 
we could have 

to to indicated that no subrecord is missing 
tl to ondicate that the a-subrecord is missing 

t 2 to indicate that the b-subrecord is missing 

We would then have 

au b = (a^b) (1, t 0 , a x , b^ (2, t Q , a 2 , b 2 ) (3, t^ bg) (7, tg, a ? ) etc. 


With this arrangement the size of aob will be smaller than the sum of the sizes 
of "a" and "b" to the extent that the type code "is smaller" than the name. The 
size of a b in this arrangement will always be smaller than the size of aub 
(to the extent that the type code in a^b is smaller than a name plus two type 
codes in aub). We see that in aub we have records of varying lengths. 

The way the file will be treated by a problem-oriented language such as Cobol 
or Algol-Genius will be different for the two kinds of file consolidation. Thus 
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a ’’read’ 7 verb will make one record at a time available for processing so that 
if the file is consolidated by consolidation of records then all data associated 
with the same identifier value will be made accessible in one call. Instead if 
consolidation is by record collection a program will have to issue a "read" for 
each elementary record with the same identifier value and will also have to do 
some preparation. 

On the other hand, if only one of the elementary records has to be available at 
a time then the file having collected records, rather than consolidated records, 
will require less working space in memory. 
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-— Adaptation to hardware system. 

Z w!T toS T “ f0rmati0n 8ystem wa have to consider limitations imposed 
by hardware design or hardware cost, imposed 

Thus for all practical considerations we can take as an axiom that standing 
files and transput files are stored in a storage which is“slowe “L 
mam memory of the computer. Therefore we say that the filed data are trans¬ 
ported to and from the main memory. In addition to this general property of 

The firsut a that f Umited aSe ™ *° matoly tW0 »>«*«« Stations. 

18 that limited memory space may make it uneconomic to frrouo too 
many computations together, because all necessary programs cannot be kwt 

^rtodr^d m fZT mUltaile0USly - “ re Pr ° grams 4 C to L trLs- 
toe4ta tr4™T't “Pi Thls may be a la ^r transport volume that 

transport it saves. The second limitation is that the number of file 

t cajmot be chosen ^ en ° ugh 10 traBs p° rt a u 

am sets separately. This limitation may again be a reason for not grouping 

some computations although this might have saved data transport aTT^Li D 
le note that com P (b) needs three input data sets and one outpiTi eltZZT 
need four transput units 2 ). Instead it is seen from E™ *> that prfr i'Tweeds 

ZZL7Z: T 0UtPUtS ° r five traBSpUt 1111148 “ total Altlrnati'velythe 

flle 4 > 4^4 f SdUC6d consolidat “g several files into one 

, givmg rise to deadweight data transport and extra runs for merging files. 

Finally it must be borne in mind that using memory space to group comouta- 

blockswhic^win redUCe th ® number of file scans may necessitate shorter data 
blocks which will mcrease data transport by increasing the number of interblock 
gaps which are also a source of deadweight transport. 

hi addition to all this we will also have to consider the time spent in replacing 
files (for instance tape reels) between processes. repiacmg 

4 ? PtlmUm to*** the data structure for a system will have to consider all 
toM^s“ d effeCt8 UP ° n ** “ d tPy 

I4t P 4 1 4L4ts ab h Sh h dlfferent feaSible 8olutlons - * which will here be 
i . . lay ~° uts whlch are compatible with memory and trahsput eouip- 
ment limitations, and to find one among these which corresponds to minimJn 
transport, in other words to find an optimum solution. This to ea 
a complicated problem and the possibility that the computation program comoli 
cations associated with grouping of computations may add one farther difficulty 
makes the problem a very difficult one, indeed/) difficulty 


1) 

2) 

3 ) 

4 ) 

5 ) 


See section 28. 2. 

See 27.1-3 (Chapter 7, section 1, number 3) 

27.3-6 

See section 28. 2. 

It is however, in general, easy to see ways for simplifying this problem 

thafonTyw th^fr 3 T f, Stem naturally s P lits subsystems such 
that only with a subsystem is file consolidation and process grouping rele- 
vant - at least as an approximation. 


321 



We do not take up here the problem of constructing an efficient algorithm for 
finding an optimum solution. Instead we give some brief study to the much more 
modest problem of finding a feasible solution. 

We introduce the simplification that we assume that first a system of grouping 
of computations has been decided on. Thereafter we look for a way to define a 
consolidation of the files which will reduce the number of transput units to a 
prescribed value. 

In the general case we may have to perform a sequence of repetitions of grouping 
processes and then consolidating files, and then consider a change in grouping 
and so forth. 

We now study the problem of finding a feasible solution by assuming that we have 
already decided on the grouping of computations of our Previous example as gi- 
ven by E 10 above.We assume that we have to comply with the limitation of 
having only four transput units (in this specific example). 

In trying to find a suitable file consolidation (or a feasible solution) we might 
start at the initial end, that is with pr(a, e'). 2 > We may illustrate the conso¬ 
lidation construction by listing the input-output relations and then use parenthe- 
ses to enclose consolidated sets. 

Note that the consolidation will often itself require a separate process. Thus 
the transport increase caused by file consolidation will be greater than indi¬ 
cated by P 00 or E 10 only. 


(e) (f, g, h) 
(a) (e', i) 
(i') (g» k ) 
(a) (e" )hlc 


(a) (O 

<b) <n 

(c)<D 

(<*> <n 


Considering which consolidation to choose for the first row in 1 (that is li), we 
notice that a is much smaller than e'. Hence, since a is an output that may 
have to be printed in a separate process, it may be irtse to keep a and^ sepa- 
rate. Similarly it may seem wise to keep e separate from f, g, h. This leaves 
the consolidation of (1J as shown. > 

For <1 2 ) separation of b is natural for the same reason as given for a in (l x ). 

Obvious arguments now lead to (1 3 ) . When we come to (1 4 ) we seci thatme. so- 
lution can be obtained in a straight-forward way. We will be forced to go bad 
and start all over again. It is obvious that we may have to iterate this scheme 
some times In the vastly more complicated cases of real applications this may 
b eTratoer complex operation. In the normal way of handling this problem one 

1) 27.3-6 

3 I The assumptions we make here are somewhat typical but not Kenerally 
justified. They could best be regarded as assumptions specified for the 
illustrating example being studied. 
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will most often have done a lot of programming for the earlier procedures be¬ 
fore one reaches a stage where it becomes clear that a restart will be necessary. 

5 Therefore in actual practice the manner of analyzing for feasible solutions using 
E or the graph may save man-years of programming. 

6 In most cases a more straight-forward solution procedure may result if we 
start at the terminal sets rather than at the initial sets. 

7 We therefore now try to start at the terminal, end, i. e. at the process giving d. 

8 First we do some simple calculations. We observe that c and d are passed five 
times in pr(d, 1 ). Hence it is reasonable to try to keep them separate from 
other files. If we do this we are forced to consolidate input data such as n with 
standing files such as 1 and e (we assume all files to be updated in some pro¬ 
cess, that is e, i, j and 1, and only those, are standing files). Hiis means that 
for instance the operation pr(h) of copying 1i from punched tape (say) to h on 
magnetic tape (pr(h) : h -»■ h) will instead have to be performed as pr(h) : h, 1, 

e h, 1, e thus causing two extra passes of 1 and e. (h is used to denote the 
h-data when in punched form). This extra transport has a volume of 2(vol(l) + 

+ vol(e) ) = 40. Instead if we would have to consolidate for instance as much 
as a » St k with c it means an extra transport of 5 vol(a,f,g,h) - 25 which 
is far better. On the other hand when punched data are read in, the accompanying 
file storage transports may be without consequence on time. This is mostly true 
when data transport is either buffered or handled by time sharing. Therefore it 
may often be sufficient to consider the transports studied here. Data transport 
buffering or time sharing may often permit us to count only input transput. We 
have written the produced results above the arrows. Thus in (2 1 )'d, l'over the 
arrow indicate that in the output set d, a.', e", l'only d and l'are produced, 
a" and e ~being only copied, from a and e". 

J &.c) (a,e',l) - d ’ 1 2 > (d)(a," e~, V) 

(i/ J3> 0* k) -— j .> (c) (j', k) 

& > i) (i» g) — 1 > (b) {i ', g) l) 

(a, e, 1) (i,K,h,J) a ^—> (a, e', 1) 2) 


10 Now we rearrange to original sequence and at the same time introduce the mo¬ 
difications made in files in later stages in all stages (thus for instance f is to 
go with i, g, h in a, e", therefore also in b, V). 


1 ) g is introduced into both sides of 9 g because it is required in pr(c, j) 
and we have already (in 9 2 ) consolidated it with i. 

2 ) 1 is introduced into 94 in order to define at once the file (a, e', 1) 
used in other processes. 
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(e, 1) (i, g, h, f) 


(a, e , 1") 


(a, e ', 1") (i, g, h, f) 

(h~, c) (a, e', 1~) — 
5 


(r, *\ h') (b) 

(h7 C) <T, k') 

<d) <e~, T) 

5 


We have now also cancelled during updating of a file such data which are not 
used any more. For instance f is not used after pr(a,e'), which can easily be 
seen in 2 because we underlined those data which are used in the process. The¬ 
refore when (i,g,h,f) is updated during pr(b,i') we leave f out. Here for in¬ 
stance the notation h" is used instead of h although h" = here. This in necessary 
since h and h" occur in different files in the same program, and similarly for 
e'. k' e" and l" in other files. 




In fig. 2 the graph of the feasible solution n is shown. 

13 hi order to estimate the quality of our solution we compare the resulting data 
transport volum e with that of the system 27.7-8 without transput limitation and 
with the basic topological one as well as with the theoretical mint™,,™ 


14 

(from 27. 7-8) 

3 16 

15 

(from 11 ) 

3 16 


File volumes 

112 

16 

Total transport volume (14) = 


u tf 

" ( 15 ) = 


19 


4 

5 

10 


3 

4 
10 


j 

2 

2 

20 


1 

2 

5 

10 


17 I° tal ®. le ' rolume > costing old and updated files, is 111 which constitutes the 
theoretical minimum transport. 

18 111 °f exam P le 4116 excess transport because of limited process grouping 
is 43 or about 35 % of file volume. The additional transport caused by limitation 
to 4 transput units is 62 or about 60 %. 


We also compare our total transport volume with that associated with the basic 
topological scheme of 25. l. Thereby we have to add 4 times vol(d) because in 
27. l we did not consider 5 scans of d at the output as. we have done here. We 
therefore obtain the basic transport volume = 249. 


0 We have mentioned that whereas the grouping of some processes or computations 
into one larger process may reduce the volume of data input transport, this 
saving may be partly lost by the necessity of using shorter blocks. This may be 
necessary for storing all programs needed in the grouped computations. 

21 In order to show that this is actually a problem worth considering when small 
memories are used, let us take up an example which, although much simplified, 
uses fairly realistic data volumes. In the above studies topological structures 
rather than numerical relations were of primary importance. Our present prob¬ 
lem depends largely on size relations between input data, programs and memory 
space. In order to be of interest the example has to use fairly realistic values 


22 Example. 

We assume that we have two processes, pr(b, i') and pr(c, j'), and we want to 

see what can be saved by grouping them to one single process, pr(b, i" c i') 
Let us assume that the number of elementary input files is 10 for each o’f the 
processes, pr(b, i') and pr(c, j'). Further assume that the number of Inputs 
to pr(b, i , c, j") is 15 rather than 20 so that grouping the two processes to¬ 
gether saves a transport volume corresponding to 5 . 


3 We fuither assume that each record in an elementary file contains 30 characters 
(or an equivalent amount of information). This would mean that the total input 
set for one processing would be 10 x 30 = 300 characters for pr(b, i') or 
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pr(c, j') and 15 x 30 = 450 characters for pr(b, i ', c, j'). This is seen to be 
fairly normal sizes. For each of the four computations (for b, i ', c and 
respectively) there will be one program and in addition one program is needed 
for the overall organization. Let us assume 2000 characters for each program 
which means 5 x 2000 = 10 000 for the grouped process, a normal figure. 

24 We now assume that the elementary files are consolidated into three tape files 
with 5 elementary files in each. We thus obtain three tape files, 1, 2 and 3, 
say. For pr(b, i') we read in 1 and 2, for pr(c, j') 2 and 3 and for pr(b, i' ,c, 
]') (assuming three input units) 1, 2 and 3. Each consolidated file record will 
then contain 150 characters. 

25 We also have to consider the inter-record gaps on the tapes and the memory 
size. We assume the inter-record gap to correspond to 150 characters and we 
assume the memory to be just sufficient to accept the grouped process 

pr(b, i", c, j'). 

26 Under these assumptions we must use tape blocks with only one record. Thus 
the blocklength will be 150 characters and the input size for each process 
step will be 3 x 150 = 450 characters for the three file records and in addition 
the same amount in record gape will be transported. The resulting volume is 
seen to correspond to 900 characters. 

27 Now if we use the separate processes pr(b, i') and pr(c, j'), each run has the 
extra available memory space of the programs for the two computations left 
out. Thus in addition to the earlier input data space of 450 characters we now 
get 2 x 2000 = 4000 characters. This leaves 4450 characters for two input blocks 
in each run. Hence we can use blocks of 2200 characters, say. In this case the 
interrecord gaps represent only about 7 % of the blocks. The volume of the input 
transport in this case will be 2 times that of the 300 characters for each run of 
pr(b, i') and pr(c, j') or 600 characters for computing one set of b, i ', c and 
j', that is about 640 with allowance for gape. Thus in our example the grouped 
process will lead to an increase in input transport from 640 to 900 rather than 
the saving which might have been expected. 

28 It is also of interest to note that we have assumed three input tape units in the 
grouped process but only two in the separate processes. If we make the more 
’’fair” assumption of permitting only two input tapes also for the grouped pro¬ 
cess we may assume 225 characters per record in each file to obtain the same 
total of 450. Rather than increasing the transport volume this will reduce the 
volume because there will now be only two inter-record gaps (of 150 characters) 
against 3 gaps above. In this way the transport volume is reduced from 900 to 
750. This is still more than the 600 needed for two separate processes. 

29 It is easy to see that if the memory size would be increased say by the amount 
of 6000 characters, then the gaps would have negligible influence and the full 
saving of grouping the processes would be obtained. 

30 It may be of interest to see what memory size corresponds to our assumptions. 
We had 5 x 2000 characters for program (including output data) and 450 charac¬ 
ters for input data or 10450 characters in all, a not too rare size, but it is more 
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common to have double the sizes for both total data volume and total program 
space, calling, of course, for twice the above memory size. 

31 This example is another illustration of the fact, already encountered in 1, that 
the design of the information handling system is inherently of an iterative cha¬ 
racter. Thus in a real situation similar to that of the example, we might have 
started with an assumed block length and on this basis have found it suitable to 
group the computations into one single process. After having done this we may 
find that we have to use a smaller block which then makes the grouping unecono¬ 
mic and a new solution, with two separate processes, may instead have to be 
chosen. It is also clearly exhibited by the example that a larger memory size 
not only increases computation speed and reduces data transport volume but 
also serves to reduce the systems design work, for instance, by removing the 
need for some iterations on the design structure, 

32 Comparing fig. 2 with fig. 27. 7-3 we find that fig. 1, the system with consoli¬ 
dated files, looks nicer. It is analogous to the type of flow charts used common¬ 
ly in data processing systems analysis, but contains, in addition, information 
about elementary files. It is important to note, however, that much of the in¬ 
formation given in 27.7 or by E 10 or P 00 is lost in fig. 1 and the corresponding 
precedence and incidence matrices. For instance it is not possible to see from 
fog, 1 which information is actually used by a process. As in common flow charts 
also the information on elementary files is missing, such diagrams actually are 
very void of valuable information. 
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Information System Design Computations . 

We have above 1 ) discussed concepts associated with defining the needs for inf or - 
mation within an organization. The set of all kinds of information needed, to¬ 
gether with the interrelationships between them and the set of processes which 
"connect" them constitutes the "basic information system " for that organization. 
It was shown that the structure of this basic information system could be descri¬ 
bed by an "incidence matrix" E 10 , which has one row for each of the "elementa¬ 
ry processes" and one column for each elementary information set, or each 
"elementary file" 2 ). E 10 then gives information about the system, such as mul¬ 
tiple file scan requirement, data origin, preceding and succeding processes etc. 

It was then shown how by grouping several elementary processes together into 
one composite process one could reduce the need for multiple input and output 
of files. It was also shown that such grouping of processes requires memory 
for storing the process programs. Hence memory space available had to be 
allocated to process grouping or to other means for reducing total processing 
time. It was further shown that for different reasons, such as reducing the 
number of tape handlers or better utilization of block space in mass memory, a 
consolidation of elementary files into composite ones has to be done. Such a 
consolidation may cause increased file input and output. A design problem for the 
information system was thus found to consist in finding a set of process groupings 
and file consolidation that would satisfy requirements of limited memory space 
and input-output units and would minimize "data transport" under these require¬ 
ments. The grouping of processes was shown to correspond to joining the corre - 
sponding rows of the incidence matrix E in a certain way. We are going to 
show how this can be described as a generalized matrix operation. 

We thereby use a way of defining generalized matrix operations introduced in 
"System Algebra" 3 ) 

Similarly the consolidation of files is associated with joining columns of E 1 ®. 

Again this can be done as a generalized matrix operation. 

We also show that the changes (caused by process grouping or file consolida¬ 
tion) in data transport required by the structure of the information system, as 
defined by E , can be obtained by correlated matrix operations. 

The matrix operations mentioned involve E 10 , or the modified versions of it, 
which resulted through earlier operations, and also the vector defining file 
transport volumes as well as matrices which specify the process groupings 
and the file consolidations we want to try. 

Any time that we have made a decision on how to group processes and consoli¬ 
date files we can easily write down the matrices specifying this decision. Then 
a set of matrix routines will suffice for us to come to know what the resulting 
data transport and memory space will be. The matrix routines can easily be 
performed on a computer if a suitable generalized matrix program system is . 
available. We have thus achieved a convenient method for comparing different 

1) Chapter 25. 

2) Section 27. 7 

3) Chapter 12. 
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design alternatives. We still have the problem of chosing those design alter¬ 
natives which we want to test. As information systems commonly encountered 
contain hundreds of elementary files and processes it appears that we are still 
left with a formidable task when we try to obtain a reasonable near-optimum de¬ 
sign as we must limit the analysis to testing very few of the combinations possib¬ 
le. However, we often observe that information systems have many properties 
which reduce very significantly both the number of processes or files that would 
profit from being joined with a specific process or file and also the number of 
them that could be so joined without breaking system constraints. By careful¬ 
ly utilizing these possibility-reducing system properties we can make tha "ma¬ 
nual" choice of design alternatives tractable. In that case the possibility to 
automatize the calculations made necessary by the matrix operations becomes 
a great advantage. Although we have no optimizing algorithm to offer at present, 
it is, of course, of great interest that our formalization of the design problem 
opens up the possibility of later automatization of the whole problem solution. 

We give first a presentation of how rows and columns in E 10 are to be joined 
in order to correspond to a grouping of processes (or consolidation of files, 
respectively) and how this can be done as generalized matrix operations. We 
then show how the number of feasible alternatives to try can be reduced - and 
how this reduction can be made automatic as well. 


3) Chapter 12 


332 











2 Joining rows in E 10 to represent groupi ng o f proc esses. 

Each row in the incidence matrix E 1 ^ for an information system represents a 
process. Every element (column) in a row is associated with a specific file. 

If an element is equal to 1 this indicates that the corresponding file is required 
as an input to the process. If out of two processes to be grouped together either 
one or both has a 1 in a certain column, then the row of E 10 for the system 
where these two processes have been grouped together is to have a 1 in that 
column position. Thus in combining rows of E^ to represent process group¬ 
ing such columns as have only element values 0 or 1 obey the rules (which are 
of course wellknown set operations) 


la 

1U0 = 1 

lb 

Oul = 1 

lc 

lul = 1 

Id 

QUO = 0 


However, some elements in a row E 10 have a negative value. It is easy to see 
that if one row has - 1 in a column position where the other row has 0 then the 
resulting row must have - 1 in that column. When it happens that one row has 
- 1 and the other row has 1, how should the combined row be formed then? 

In other words, what is the rule for determining the value of - lul in this 
problem type ? Clearly, we have to distinguish between two different situations. 
If the file produced by the output operation indicated by - 1 is only to be used 
with the process with which we are to group it, then obviously both the input in¬ 
dicated by 1 and the output indicated by - 1 can be eliminated. Thus in this 
case we want to have - lul =0. On the other hand, if the output file is also 
required for other purposes, then the output operation can not be eliminated 
while still the input to the process to be joined is eliminated. In this situation 
we therefore shall have - lul = -l. 

Normally it can be seen from E-^ if a file which is output from one of the 
processes we consider for grouping is input also to processes outside the group. 
This, simply, is then indicated by the existence of other 1-values in that column 
of E 10 . (Note that several - 1-values can not exist in one column pf E 10 ). There¬ 
fore we can assume that when we order the joining of two rows of E*° according 
to our rules, as set out above, it is uniquely defined within the system itself 
which one of the alternatives to use. Only if a file is to be output for use in 
other connections than as input to other processes represented in E 1 ® then the 
operation will not be defined. Thus for a unique definition we require the E 10 
to contain all processes of using all files it indicates by - 1 in some row. This 
can be done by adding one row to E 10 which represents "all other uses of 
files". The same effect can be obtained by tagging all files thus used. It follows 
that we can assume that all operations involved in combining rows of E 10 to 
represent process grouping, which have so far been discussed, can always be 
uniquely defined at the time of performing such operations. There remains to 
decide on w r hat to do in the cases when multiple input and output file scans 
are required by a process, that is when instead of 1 or - 1 we have k and - 
1, k and 1 being integers. It is reasonable to expect that if one process re¬ 
quires k input or output scans of a certain file and another process, to be 
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grouped with the first one, requires 1 scans (1 & k) then the grouped process 
requires k scans. However a combination of this kind will always be very 
specific - if it would at all be feasible. Therefore we find it most reasonable, 
at this stage, to decide that such situations will not be handled by our procedure. 
If this is programmed for a computer the program will be supposed to print an 
alarm thus asking for a precise guidance on this point from the human analyst. 
(As we shall see later we may come to use elements of E 10 which are different 
from 1 (and are not even integers) also when single scan operations are consi¬ 
dered, in order to take care of different processing periods. In such cases we 
have, of course, to use the same logic as when only element values 1 or 0 occur. 
Again we have a reason to use some sort of tagging to be able to distinguish this 
case from multiple scan situations.) 

Example. 




(this is the row for AUB 
in the modified E 1 ) 


We have (AuB)^ = - lul = - 1 because the grouped process A\JB needs 
to output a for use as input to C. Also (AuB) 4 = 1 because d needs only to 
be input once to the combined process AuB. 


BUC = 


1 

0 

-1 

1 


Here (B uC) 2 = -lul = 0 because when B and C are grouped the file b will 
not have to be output neither will it be input. If b would have to be used for 
other purposes than indicated in E 10 this would have to be mentioned or tagged 
in some way. Such tagging missing we will assume that no other use of b is 
called for. 
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Representing process grouping by a generalized matrix operation. 

We now introduce an operation (w) in such a way that it will lead to a modifica¬ 
tion of E as required by a specified grouping of processes. First we notice 
that as we want to operate on rows of E 10 rather than on columns we have to 
transpose so that we use E T 10 , instead of E 10 , for the column selection pro¬ 
cedure. Thus if g is a vector which selects such rows of E 10 as we want to 
combine we shall obtain our desired result by the operation E T 10 (o) g. We 
shall find it convenient to write this instead as 

1 grp ( u ) E 10 plus transposing of result 

as is often done in similar situations with conventional matrix algebra. Thus if 
we write the selecting vector in transposed form to the left of a matrix we shall 
obtain a selection of rows, rather than columns. 

As g is only to select rows associated with such processes which we want to 
group together, all elements of g will be 0 or 1. We shall take op 1 to be 
simple multiplication and thus after the operation op 1 we shall, in this case, 
have just the rows selected by unit elements in g. For op ? we now take the 
operation as defined in section 2. As an example let us took at 


grp (U ) E - 

“e 

e 1 




f I 

11 

11 




[01 lj (u) 

e 21 

e 22 


[ e 21 e 221 u [ e 31 

e 32 



e_ 


L J L 



31 

32 

=s> 

[ e 21 U e 31’ e 22 U 

e 32 


Let us now do this operation on the example of section 2. 

Suppose we want to group the processes A and B. This is defined by 

g T = [11 o] 


g T (u) E 10 



if we use the same procedure as in section 2 to define the result of -1 1. 

It is easy to see that the compostion BuC also is obtained as in section 
2 by grpUE 10 . 
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)ace. 


Matrix operations to compute file transport reduction and memory sp 

If we look at that part of two rows of E 10 which contain only 0 or 1 (but not - 
1) then we find that the combination of rows, if done by op 2 = n= rather than 
(^ being used to symbolize, as is common, the intersection operation: 


la 

oni = o 

lb 

lno = 0 

lc 

101=1 

Id 

ono = o 


will give as result an account of how many file scans are being saved by grouping 
exactly two specific processes together. This fact suggests the use of " 
also for an operation which gives the total, resulting saving of file scans if 
more than two processes are grouped and also when elements = - 1 are contained 
in the rows (as is always the case, of course). 


We find that we have to put 


x l r 'W" X n =k ' 1 

if exactly k of the x^s have value 1, 

the others being = 0. 


If one of the values in a column, among the rows to be combined, is - 1 (there 
can only be one) then the result is well defined by the specification given in sec¬ 
tion 2. 

Thus if in the sequence 

x H /^x„/-i ...x one x. = - 1 and only one x.(j£i) has value 1, all others being 0, 
12 n i j 

then x_o x r\ ... x = 2 if no other uses of file x are indicated (for both an out- 
12 n 

put corresponding to - 1 and an input corresponding to 1 are eliminated). 


If x. = - 1 and k of the other x. -values = 1 then we shall have x ox n 

i ] x it 

x^ = k + 1, if again no other uses of x are indicated. If the file x is also 

used in processes not covered by x ,x ,.. .x , that is if also other rows of 

1 6 II 

E-^ have a non-zero element in this column, then the saving is = k. 


We find that if we define g T (n)El° in a way that is analogous to the definition 
above for g T («j)ElO, the only difference being op^ = (as defined here in¬ 
stead of op 2 =uas defined earlier) then g T (r\) Elv will result in a row vector 
having an element for each file, telling how many scans of that file are being 
saved by grouping the processes represented by unit elements in g. 

Of course the number of scans saved can also be obtained by simply adding 
tlie absolute values of all elements in each column of E^O and in the difference. 


Example 1. If we take the example of section 2 and consider grouping the 
processes A and B we have 


B a = -lnl =1 (as file a has to be output (in A & ) because its is needed 
also in the process C.) 

A^nB^ = O^n-l =0 

A c nB c = 0oo =0 

A, OB,=lol =1 
d d 

hence g-p (o) E 1 ^ = 


abed 



B^o C = - lol = 2 (rather than 1) 

because b is not used elsewhere (hence both the input and output file scansare 
saved). 

We have now defined the operations and (^) eIO and know 

that both result in one single row. It is easy to see that g can be obtained 
by forming the "logical sum" of those columns of the unit matrix I associated 
with the row numbers of ElO selected by g T . It follows that if G is the ma¬ 
trix obtained from I by joining the rows in groups as selected by different 
grp and for different, simultaneous groupings of processes leaving the rest 
of I unchanged, then 

5 G(v^) E 10 ^ E 10 for the modified system =*• E g 10 

The row vector s obtained by the operation s=g T (n )E*® defines the multi¬ 
plicity of scans saved for each file by the grouping g. Hence if 

6 v = [v^], i = 1,2,... n 

is a vector of which element vj is a measure of the transport work (transport 
time) for the file i, then the scalar product 

n 

7 s* v = 22 s.v. 

1 1 1 

gives a measure of the total transport saving obtained by the process grouping 
defined by g. 
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Example 2. We take the same system as in the example of sections 2 and 
5 and add one row to ElO, labelling it "D" and also add column d 1 . We 
consider grouping (A and B) and also (C and D). 



The multiplicity of file scans is given by 


a 

b 

c 

d 

ai 

--E 

and 

2 

2 

2 

1 | for the original E 10 

vCl 

2 

0 

1 

1 | for the grouped system 

Let the file transport measures be given by 

V = E 

2 

3 

4 

Z3 


Then the total file transport is 


trp tot = 1 * 3 + 2.2 + 3 • 2 + 4 • 2 + 1 • 1 = 22 for E 10 
and 

(trp tot)g = 1 • 2+2* 2 + 3* 0 + 4- 1 + 1- 1 = 11 for E g 10 

As »n illustration let us draw diagrams showing the original system S (with 
E 10 ) and the modified system S (with Eg 10 ). 







It Is seen from the results above that we saved a total of 22 - 11 = n in file 
transport. We now compute this by the operation g T (/-M E 1 **. We have two 
groupings g T ^ and g T2 


S 1 = g Tl > El ° = 

®2 = ST2 )E 10 = 

We get 

S 1 = gTl )E 10 = 

s 2=St 2 < r ') El ° 


A 

B 

C 

D 



F 

1 



(O)E 10 







L_ 


1 

_D 

(n)E 10 

a 

b 

c 

d 

d 1 


E 



1 




2 


The resulting saving is defined by the vector 
a b c d d* 

8 = s 1 + s 2= H 2 1 1 


Hence, the total saving is 

s’ v = 1 • 1 + 2* 0 + 3* 2 + 4* 1=11 
in agreement with the result above. 


The grouping of processes does not only bring reduction of file scans. It also 
brings about increased memory requirement. This is because when several 
processes are to be performed in the same file scan it will often be required to 
ave t eir programs in memory at the same time. The increased memoiy can 
be regarded as the price paid for reduced file scan multiplicity. This price 
could possibly buy other advantages so we need to analyze the alternatives. 

It is thus seen that we want to be able to compute also the memory require¬ 
ment induced by the grouping of processes. We assume here that we have sol¬ 
ved the ("component") problem of determining the memory need for each simple 
process and we ask how this information can be used to compute the memory 
need of the grouped process (that is to solve the "systems problem"). 

As a, very crude, first approximation one can assume that the memory space 
requirement for a grouped process can be represented as one space common 
to the whole group plus one additional space quantity for each simple process 
m the group. If then 

m = [mj] 

is a vector (for the original system) such that 1x4 = the additional memory need 
for the simple process "i" then the corresponding vector for the System mo¬ 
dified by the grouping of processes defined by G is obtained as 

8 nig =5 Gm 

where conventional matrix multiplication is used. 
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-JMgHlation sJor minimum file transport desip 

space limitation, process ordertog, "le sorw Z.?* 
defined them by g,, Ro th ®l ,, i a ! f , S) , process groupings and 

j.j.. , ^ ®2* * * * °1* G is defined and we can nmrinpp +i, 0 

dified system incidence matrix E 10 provided FlO; a the mo " 

scan processes are to be involve#* the ^ouniL wc ^ *; “ mUltiple file 

specifications for how the erounine intl„Z P ?f S ls0 muat have Provided 
total transport amcTts P ® 068 the ” 1 - W " ean th “ co ~ «» 

2 10 ‘ v (1 T being the transposed vector ■*) 
consisting of elements 1 only) 


tr =l, 


tr. 


- 1 T 


T1 10 

E g ’ V ~’ 1 T 


G(u) E 


10 


1) 


and compute the saving tr - tr 
Hence we find the transport saving 


3 (Z gj T (n) E 10 ) • v . 

we can ® rou « ,to S desi 8“ s > 


or (*T G (^) E*®) min. 

5 (S gi T (O) E 10 ) max. 

while gj (or G) are such as to satisfy the constraint 

(G • m) A not greater than available memory space, for any i. 

to estohl i qh deVe ^° P th * s P^ oced ure into a fully automatic routine we still have 
Tn p.flt bl i h an algorithm for selecting the set of different groupings to be tested 

processes tatho M ^ 7 Sma11 ’ although the 10431 “““ber of files and 

processes in the "un-grouped" system may be very large Therefore 

a formalization of the selection of feasible groupings (that is groupings!.^ 

mav in so™ C ° ndltlon f of order > whioh the relations defined by ElO prescribe) 
a 00 ^ 1 ! CaS6S reduce the “umber of combinations enough to make possible 

a full P e f e ana l ysis of 311 o£ them - hi such cases, then, we would have achieved 
a fully automatic solution of this design problem. For more complex ^stems 

Notice that the arithmetic rules involved in operations unnn tt 10 * 
especially as negative elements of E £ 0 are JncerTd! ^L 
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such full automatization may not be reached. Then such a procedure would 
still be useful, for it will reduce the amount of intuitive - or "manual" - 
choice of the different designs to be tested. It also seems feasible to use a 
procedure where a set of random selections of the set of feasible groupings 
is made and then tested to define the best of them which, while not exactly 
optimal, may well be a better solution than would be obtained from a few 
intuitive choices. 

We conclude this section by pointing out that for the design problem of 
minimizing process groupings we are already by the present results in a si¬ 
tuation which is very similar to what we have in many engineering problems, 
where a fully automatic design procedure is not yet available but where com¬ 
puters are of great help in reducing design analysis work and improving design 

quality. 

We also point out, however, that in a real information system design we have 
to undertake a second kind of system modification since not only do we want 
to group processes in order to save file transport but also we want to conso¬ 
lidate files in order to keep input-output equipment within reasonable bounds. 
This problem will be studied below in section 7. 
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6._ Procedures for aiding the intuitive system design phases. 

We have already made clear (we hope) that we are not trying to obtain full 
automatization of the design of an information system in this book . Rather we 
are satisfied with obtaining a formalization which helps intuitive design in 
different ways. One way it helps is by making possible automatic computer 
analysis of design alternatives. Another way would be to help in selecting dif¬ 
ferent groups of processes which are worth considering for finding the design 
alternatives to test. In small systems, such as we may use as illustrations in 
a paper like this - and in education exercises - it may be easy to see direct¬ 
ly which choices are worth trying, by looking at the graph or the incidence 
matrix E 10 of the system. A real system will, in general, be much too big for 
this to be a feasible procedure. In such a situation it will be of great value to 
have a computer for help. 

We have already in section 5 seen how a calculation can be used to show whether 
a certain process "i" would be worth combining with another one, "j", in 
order to reduce file transport. Thus we found that if gjj is the grouping vector 
defining the grouping of the processes "i" and "j " then 

(gij (n ) E 10 ) • v (v = file transport vector) ^ 

would measure the transport saved. By letting j run through all values we 
will obtain a measure of the value of combining any process "j M with a certain 
process "i". In this way we obtain not only an indication of which processes 
are worth combining with any specific process, but also a possibility to compare 
the gain from any such pairing. While it is still quite a problem to establish 
algorithms for using this kind of information for optimization, it is obvious that 
this information will, itself, be a valuable aid for the human who has to make 
the design. 

We can display the result of the analysis described in the preceding paragraph 
in a matrix where we have one column and one row for each process and where 
each element in a column measures the transport saved by grouping the process 
for the column with that associated with the row position of the element. The 
procedure described will reduce the problem of choice for the system design 
to an extent which is dependent on the topological properties of the basic infor¬ 
mation system (or of E 10 ). Although a detailed analysis of the kind assumed 
here, and in 2La 63-1, has not yet been used to its full extent, it can already be 
said that most real systems have the properties necessary to make the reduc¬ 
tion obtained quite significant. 

We now turn to another procedure for further reduction of the choice problem. 

It is, by the way, not an operation that we want to perform in order to simplify 
design analysis but is in fact a logically necessary one.. It is related to what 
conditions must be satisfied in order that two processes may at all be grouped 
together. Such a condition can easily be seen to be associated with the ordering 
structure imposed upon the information system by the information precedence 
relations discussed in 27.6-l6This precedence matrix should be sufficient for 
testing this condition, 

1 Notice the special rules for matrix algebraic operations involving E defined above. 
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It is easy to see that if process "j" is a precedent to process "i" but is not 
an immediate (or l:st) precedent, then "i" and "j" cannot be grouped. If a 
process "k" is a l:st precedent of "i" (that is, k « i), this means that a 
file that is an output from "k" will be an input to "i" . This, of course, is 
no hindrance for "k" to be grouped with M i M . If however j « k and k « i 
then "j" is a 2:nd precedent of M i M . In this case "j" can not be grouped with 
"i" , for "i" can not be run before "k" has been performed. Obviously it is 
impossible to group "i" and "j" if " j" is an n : th precedent of "i" (that is 
j i for any n-value greater than 1). We thus have the condition 1 ) 

2 Condition: Two processes "i" and "j" cannot be grouped together if 
"j" has a precedence relation with t, i tf which is of higher order than 
the l:st. 

Notice that it may happen that both j < i and j « i hold. For instance the set 
j « k 

3 k « i 
j «i 

has the structure 

j k i 
Fig. 1. 

which is quite feasible. In this case ’'j'* has precedence relations with M i M of 
both l;st and 2;nd order and, hence, the condition for not being groupable is 
fulfilled for the process pair i and j in this example. Obviously, if the condition 
above is not satisfied, then "i" and "j" can be grouped, that is: "i" and "j" 
can be grouped if either 

4 i 4- j (and j £ i) 
or 

i « j (or j « i) 

and no higher order precedence relation holds between "i" and "j". It follows 
that we can test whether two processes "i" and "j" are "groupable’ 1 by com¬ 
puting the total precedence sets of both. 

If P is the precedence matrix for the files in the system, then each column in 
P is associated with a file which is output from one process. Each row of P 
is associated with files which are input to a process. Hence each column in P 
is associated with a process and selects files which are immediate precedents 
of the column file and therefore also indicates which processes are l;st pre¬ 
cedents of the process of that column. This is because each input file is pro¬ 
duced by exactly one process. 

5 It is easy to show 2 ) that the squared (or "iterated") matrix P * P = P 2 

1) Cf. 27.6-16; Restriction rule for grouping. 

2) Cf. Chapter 12 (System Algebra). 


selects files which are 2:nd precedents of the files associated with its 
respective columns. P d likewise selects 3:rd precedents and so on. 

It follows that if the file produced by the process "j" (let us call it the 

file "j") is not selected by the i:th column of P 2 , P 3 .and vice 

versa, then "i" and "j” may be grouped. It can also be' shown that for 
a certain value of the integer "1" P 1 f 0 while P 1+k = o for any positive 
integer k. 1 is a topological constant for every feasible system. Thus we 
need only perform a limited number of matrix iterations. 

It is also easy to show that P can be computed from E 13 . Let E* 3 be the 
matrix obtained from E by replacing by 1 all elements different from 
zero. Then obviously we can write E 13 as a difference of two matrices 
which have all their elements either equal to 1 or to 0. 

3 E 13 = P T 01 - pl« 

where P 13 is a matrix which for each file has a column which selects the pro- 
cesseSiContributing to its production (that is the processes preceding the file) 
and P ) has a column for each process, selecting the files preceding it (the 
input files of the process). ' 

We have (see chapter 12) 

7 p - p00 p01 # pio 


This is easily seen because the i : th column P. 10 of P 10 selects, by its ele¬ 
ments, the processes contributing to "i", and in P 01 . p 10 these elements 
then select the columns of P associated with these processes. Finally these 
columns select the files which are inputs to the processes contributing to "i" 
and which are thus l:st precedents of "i". If all the files are elementary, then 
each file is produced by one single process only and each column in P 1 ® selects 
one column only of P 01 . During the design process, however, we are grouping 
processes and consolidating files. 

EXAMPLE. We take E 13 of example 2 in section 4 and apply 6, that is we 
take all negative elements in E iU and move them to a second matrix: 


abcdd 1 abcdd 1 

E 10 = P T 01 - p!0 a A 
B 
C 
D 
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We then obtain P by 7: 


a b c d d J 


b e d d J 


p_.p00_.p01, pio a 

b 

c 

d 

d 1 

1 1 

1 

1 

1 1 

•. <p 00 > 2 = 

1 1 

1 

1 1 

L__ _ 


a b c d d 1 


a b c d d 1 

(p00)3 _ a 

1 

; (P°°) 4 = 


b 




c 

d 

1 1 


1 

d 1 





(p00)5 = 0; 


P can be checked against figure1 of section 4. We now obtain by mspeetion 
of the columns of (P 00 ) 2 . (P 00 ) 3 , and (P 00 ) 4 that for instance the <p“ 10 
of these matrices select (a, d); d; 0 ; respectively, that is a F ™ ® 

It is seen that c is produced by C, a by A, and d by no process. Consequently 
we have established that A and C cannot be grouped. In this way we find. 


A cannot be grouped with C, D, 
B - ” D » 

C “ A, 

D - - B, A, 


The result has been displayed in the matrix M below by shadowing all boxes 
corresponding to non “"feasible grouping. 



if no files are used elsewhere 


in the boxes left open we have inserted the transport savings associated^ with 
the corresponding groupings. The ordering condition 2 is seen to have brought 
quite a reduction of the possibilities to be tested. On the other hand the calcu- 


si 
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lation of transport quantities did not, in this example, bring any reduction as 
all the feasible combinations save some transport so that there is no one which 
is not worth while. On the other hand the different amounts of saving to be 
achieved by the different pairings is of course information which can be used 
to guide the design. 

It is easy to give an example of a system such that also the transport quantities 
would serve to reduce the number of groupings worth testing. Thus if we add to 
the system a process E, the output file of which is an input to process D, then 
we have 


E £ A 
E it B 
E £ C 
E « D 

and hence E may be grouped with any of all the other processes. On the other 
hand only the grouping with D will save any transport. If the file produced by E 
has a transport amount equal to 1 the saving to be obtained by grouping E with 
D would be 2. For this system we would have the M matrix modified to 


ABODE 


M = A 


B 


C 


D 


E 


In addition to those properties of an information system, which we have used 
so far to reduce the number of design configurations to be tested, one very 
significant property remains - the different sorting sequences used with differ¬ 
ent files. While it seems desirable that the design analysis should also contri¬ 
bute to efficient solution of that problem and therefore the sorting strategy 
should be regarded as not fixed during the analysis, it appears to be most 
realistic at the present state of the art to regard file sequence to have been 
specified before the analysis starts. In that situation the sorting sequence of 
the different files will also define a partitioning of the set of processes into 
one subset for each set of file sequences. In other words, if a set of files a, b, 
c, d, e is such that a, b, and c have one file sequence (that is are sorted to the 
same key) and d and e have another sequence, then any process associated 
with any of the files a, b, and c cannot possibly be grouped with a process 
associated with d or e. In most practical systems this sorting condition will 
bring about a very significant subdivision of the system into much smaller 
subsystems, which are completely separated with respect to the process 
grouping analysis discussed above (and also with respect to the file consolidation 


w 

2 
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////A 

0 

2 
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problems to be covered below). 

It may be of interest to those familiar with abstract algebra that the property 
of having the same sorting sequence satisfies the axioms of an equivalence 
relation. From this follows that only processes belonging to the same equiva¬ 
lence class may be grouped. The problem of defining this partitioning of the 
set S of all processes then is the problem of defining the quotient set S/M, if 
M is a subset of S which defines all sorting sequences. 



Defining file consolidation by matrix operation on E 10 , 


As has been discussed before one of the basic problems in information 
processing system design is to consolidate the small elementary files into 
larger and fewer ones. This is necessary in order to keep input-output require¬ 
ments within economic bounds, e. g. the number of tape handlers. If a so-called 
random access mass memory is used, this may not be a problem. Then instead 
we have block size built into the equipment. We may also have to make a com¬ 
bined design by using both kinds of storage. Thus the problem of efficient file 
consolidation is a problem of general importance for the information system 
design, whether it is built upon serial access memories, random access me¬ 
mories, or both. 

If two files, a and b for instance, are consolidated into one, then this conso¬ 
lidated filejvill have to be input to every process where either a or b are needed 
for input. Thus the corresponding combination of the columns a and b of E 10 
will result by the common union operation 1 u 0 = 1, 0 ul = 1, 1 w 1 = 1, and 
0 0 ~ 0 for such parts of columns a and b which have elements equal to 0 or 1 

only. Such elements which have value = k (k > 1) will be supposed to obtain 
special treatment in analogy with section 4. The consolidated file will have to 
be output in one version from the process producing a and then in another ver¬ 
sion from the process producing b. In addition to this (if it is not a standing 
file) it may have to be input to that one of these processes which will be per¬ 
formed subsequent to the other one. (Notice that this will increase the demand 
for input equipment for that process and, hence, will often call for a further 
consolidation, which is then to be performed with another input for that pro¬ 
cess. ) It must be recognized, however, that when the combined file is output 
from the first of the two processes, it does not yet have its final form and it 
must be given an identifier of its own. 

For instance, let us assume that we have a system as shown in figure 1. 



Fig. 1. 


We now want to see what happens if we decide to consolidate e with a, forming 
(ae). We assume that e is new information entering the system, d may either 
be new information or a standing file which is called "a” when updated. To 
consolidate, we may either have to insert an additional process a, e —> (a,e) 
or we take e as an input to A. We decide to choose the latter alternative. 

Fig. 2 shows the modified system. 
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f instead we assume both e and a to be standing files, we obtain^) the 
if figure 3, and when a and e are consolidated, we get figure 4. 






Fig. 4. 


The incidence matrices corresponding to figures 1 and 2 are 


a b c d e d 1 


E i0 = A 
B 
C 
D 


-1 1 
1-1 1 
11-1 
1 1-1 


1) We have made some arbitrary choices here, for illustration. 


-in j Ml. HlfttiMJ_i 

















and 


ae b c d e d 1 

2 E c 10 = A 

B 
C 
D 


-1 11 
1 -1 
1 1-1 

11 -1 


It is seen that column (ae) in E c 10 is obtained aseua (from E 10 ) and that in 
addition a new column for e is introduced. This column simply assigns e to be 
input to that process which outputs the file with which e is to be consolidated. 
For the systems of figures 3 and 4 we obtain analogously 


a b c d d-*- e a* e^ 


E 10 = A 

B 

C 

D 

E 

11-1 
-11 11 

1-1 1 
1-1 1 
11-1 


ae b c d d^ a*e a*e* 

E c 10 = A 

1 1 -1 

B 

-11 1 

C 

1-1 1 

D 

1-1 1 

E 

1 1-1 


On comparing these two matrices we find some rules for constructing columns 
in E c from those of E 10 in accordance with a prescribed consolidation of two 
standing files. These rules are obviously of general validity. First we see that 
the consolidated file (ae in this case) simply takes the place of that one of the 
original files which is chosen as the first one to be updated. 

In the example this means that column ae of E c 10 is put equal to column a of 
E , corresponding to taking ae in place of a as input to A. Then the "half- 
-updated" file a^-e is uniquely determined by its simple role of being output 
from one of the two updating processes and input to the other. Finally the 
fully updated file a 1 e I gets its column by forming a 1 o e 1 with the modification 
that the process updating the first component of the consolidated file is not out- 
putting the fully updated file. Thus in the example the - 1 for process A in 
a ^ e has to be deleted. (Notice that this is a general rule for updating con¬ 
solidated, standing files.) 

We can describe the process of "consolidating" the "file columns" of E 10 in a 
somewhat different way which better exhibits its close relation to the joint ope¬ 
ration u . Thus, to form to be obtained from E 1 ® by consolidating two 
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standing files a and e (for instance), replace a 1 and e 1 by a 1 u e . (Here upper 
1 denotes an updated version,) Let a be the name of that file which is first 
updated. Then in E 10 replace the column name a by ae and replace the name e 
by a 1 e. Finally move the - 1 which stands in the row of the first updating pro¬ 
cess (process a - or A in this case) from column a^e 1 to column a 1 e. 

We can now conclude that by a somewhat specialized definition of the operation 
( <u) we can compute E c 10 from E* 0 by 

E c 10 = E 10 ( u ) C 

C =5 a matrix which specifies the file consolidations considered. 

The consolidation matrix C (or rather its transpose C T ) is defined in analogy 
with the definition of the grouping matrix G introduced before. 

The incidence matrix for a system design after one or more operations of 
grouping and consolidation will be obtained as 

Eg® = G ( u ) E 10 ( u ) C 

with the proviso that the operations ( (j ) have some different special features 
in the grouping and the consolidation phase respectively (and, hence, need dif¬ 
ferent symbols in an automatic implementation). 

It is seen that, in general, the file consolidation increases the amount of file 
transport. This is the price we have to pay in order to reduce the amount of 
I/O equipment. (Thus file consolidation is by no means a generally desirable 
thing by itself, contrary to what is often stated.) On the other hand, "transport” 
time involved in tape set-up may sometimes be reduced to such an extent that 
the total result of the consolidation is reduced time. 

Another important observation is that often most or all of the increase in trans¬ 
port time by consolidation of files can be eliminated by suitable process grouping. 
As the system design work will contain a sequence of alternating process group¬ 
ings and file consolidations, we have thus come up with another factor to be 
considered when we search for the best possible process grouping. Thus the 
potential for file consolidation that a certain grouping carries with it has to be 
considered in combination with the reduction in transport it brings. Alternative¬ 
ly this can be expressed by saving that not only savings of transport in the 
system as it stands but also potential savings brought to possible subsequent 
design states after file consolidation have to be estimated. Thus, each grouping 
operation has to be searched in a certain stage of consolidation and viceverca. 

While the search for the best grouping (for a certain stage of consolidation of 
files) considers the maximization of transport reduction (plus maximization or 
"satisfication" of file consolidation) any file consolidation tried has to be tested 
for maximum reduction of I/O-equipment (plus minimization of transport 
increase). 

Thus a procedure for aiding design in this stage would compute equipment 
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reduction. This is easil 
row-sums of E 


10 _ 10 Seen to obta ined from the maximum value of the 
“ E ( y) c > counting all elements in E„ 10 as units or 


, C ' — ' • -o vj.viiivui.g xii xii _ y,y LUUIS or 

H 13ta31Ce , in S ! le0ting candldate groups of files to be tested for possible 
consolidation can also be obtained from the computer, in analogy with the 

grouping phase. Thus groupings may enable consolidations to be made without 

matrfeLd a^rTf't^ 011 00 “ s ° lidations C£ul be fo und by scanning the incidence 
matrix and are, m fact, searched already in the grouping phase. Further si- 

llarity of columns in E indicates feasibility for consolidation and can, of 
course, be pointed out by simple algorithms analyzing E 10 
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a influence on Progra mming Language Development^ 

Even with programming a lot of programming 

information systems such as C , define the processing pro- 

•—i— nrcr'-s."^. ~— 

identifiers and set up lfnkagee to the appropriate pretenses. 

: s."“ — s -« Z1SSZL 

from the old file to the new file. 1 > 

“ss^tIToS .fme^nsouluons 

used in the system. 

We may conclude that our study of 

lop a new compiler teclmx^whrch w of elementary procedures. When 

the matrices G, C. and & qimnlified task of program- 

have already been established in the process of designmg the system. 

much of the more complicated P» g — ^^“^pment, would 
new structure matrices G and C wo ® . rep i a cement of one or more 

iSJS —“— 

a™ „»4-4.wivio- Qicmifinantlv. 


1 ) See section 28. 3 
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1, Files in Systems Using Mass Memories of Pseudo Random Access Type . 

X While it should have come out clearly that the discussion of information system 
problems made in chapters 21 through 26 and also in the sections 1 and 2 of 
chapter 27 is either completely independent of hardware or holds true for every 
kind of auxiliary file storage used, this may seem less obvious as regards sec¬ 
tion 27.3. For the rest of part 2 it will not always hold true. Much of the dis¬ 
cussion in these latter portions of part 2 appears to be oriented towards the 
problems associated with serial access file storage (e. g. magnetic tapes). It 
is therefore of interest to find out the extent to which different results come out 
if other types of file storage are considered. 

2 It is important to note that all backing stores for large files are associated with 
an access time much longer than that of the central memory of the computer (-s) 
used in the system. (That is we take this as an axiom, which means that to the 
extent it is true our conclusions will be believed to be true.) As a consequence 
any use of a backing store will be connected with a data transport. Hence this 

is nothing specific to serial storage. Even if in the future mass memories 
would be as fast and random access as the present core memories, it seems 
probable that computers will again have still faster control memories so that 
the situation will become similar to the one we have today. 

3 It is further important to note that all mass storages are of a pseudo -random 
access type rather than being truly "random access". This means that the 
time needed to retrieve data will depend on where these data are stored in re¬ 
lation to where the last retrieved data where stored. 

4 We conclude that the problem of trying to reduce transport volume is still with 
us. This was the topic of section 27,3. It also follows that the topological trans¬ 
port factor for an information system is one of the relevant parameters also 
when so-called random access stores are used. 

5 It follows from 2 that process grouping will still be a possible tool for re¬ 
ducing transport. 

6 Likewise it follows from 3 that blocking file records to make for longer data 
blocks and to reduce the storage space in the backing store may be of value and 
thus may also need be considered as an alternative to grouping processes, as 
is discussed in 28.1, 

7 We can now conclude that also the whole of chapter 28 is still relevant if we go 
from tape files to "random access" files. 

8 When "random access" (r-a) file storage is used exclusively the need to limit 
the number of file stations to a very low value (10 or less) disappears. There¬ 
fore one of the main argument used in 29.1 and 29.2 and which leads to con¬ 
solidation of files is no longer relevant. Thus we need not seek for the optimum 
consolidation of files to come down to a required maximum number of file sta¬ 
tions, or find the most economical number. Hence one might expect to be using 
less complex files when r-a-storage is used. 
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9 On the other hand there still remain hardware reasons for consolidating files. 
One reason for this is that a record of an elementary file may use only a small 
part of the physical block or cell made available at one access operation. There 
fore if each elementary record is stored in its own block and if a process needs 
e elementary files and each block takes b memory positions we need to allocate 
at least e » b positions in memory for I/O-areas. If, for instance, f_ elementary 
file records could be stored in one block then a consolidation of the correspond¬ 
ing^ elementary files into one file would reduce the memory area needed for 
I/O by a factor of f_, that is to e • b/f. If b is not too small then a reduction by 
some factor f may be a necessity thus enforcing us to consider, again, the 
problem of file consolidation. It may be expected to be less requiring in this 
case, however. 

Of course one could achieve similar results by blocking together f records of 
the same file. This, however, will cost more memory for buffer areas and 
processing for "un-blocking" records before they can be used. 

10 A solution to this problem could be to have r-a file storage with smaller blocks 
(smaller b value). This may, however, result in more accesses becoming ne¬ 
cessary for all required I/O. This could well be an inferior solution. 











2. Direct Processing versus Batched Processing, ^ 

1 When transactions are input data to a process which uses some files then, in 
general, the files have to be scanned in total during the process. This is al¬ 
most always true when serial access file storage is used, but it is true even 
for random access storages if the file data are obtained through ’’retrieval by 
property” 1 2 ) or "content addressing" as this type of access has come to be gene¬ 
rally called now. This total file scan will introduce a considerable transport 
and this will be independent of the number of transactions in the batch of trans¬ 
actions taken as input to the process. As a consequence in these circumstances 
there will be a file transport volume per transaction which is in inverse propor¬ 
tion to the number of transactions in the input batch. Consequently a batching 

of transactions which approximates the optimum balance between reduction of 
file transport per transaction with larger batches and an increased delay of pro¬ 
cess results production has to be found. Thus an optimization problem has to 
be solved. 

2 Now if random access file storage is used, and retrieval by name (direct addres¬ 
sing) is possible, then it appears that direct processing rather than batched 
processing should be used. It follows however from what has been said in sec¬ 
tion 3 that this is not necessarily true. Thus direct processing of transactions 

as they arrive in random order will call for frequent moving of process pro¬ 
grams from the backing store into the memory and, as was seen in 3, will hen¬ 
ce cause an increase in transport. Also a multiple transport of file data will 
still occur because of direct processing, as it often occurs that batching results 
in collecting several transactions associated with the same file record. 

3 Also it is not always true - perhaps it is only seldom true - that to let a trans¬ 
action wait in a batch collecting queue before being processed is a drawback 
from the point-of-view of the managed system. 

One fairly simple example of this is that one would not like to let the system 
produce several bills to the same customer in the same day because several 
input transactions have been processed directly. Rather a processing period of 
one day or one week will be more ideal in many cases. More precisely: there 
is in each system for each kind of processing a certain response time which 
determines the ideal processing period. Then another period, longer or shorter, 
would be used only if the balance of system response considerations and informa¬ 
tion processing costs makes this economically efficient. 3 ) However there are 
also principal reasons for using batch processing. For instance in a payroll 
calculation where "team incentive" is being paid, the records for all workers 
in the team must be available before the calculation can be finished. 


1) See also examples of section 24.1. 

2) 3 La 1961. 

3) It is also known from industrial dynamics that too quick responses to 
changes will often call for dynamical instability which, of course, is 
exactly what information processing attempts to reduce, see 1FO 1961. 
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4 Likewise when systems of equations have to be solved or, for instance, a 
linear programming calculation has to be performed as a part of the informa¬ 
tion processing, then the whole associated matrix, which may be made up from 
many transaction records or a significant part of it, must be available. 

5 It is possible however that in some cases where the numerical methods present¬ 
ly used require a set of data to be available at the start, future development - 
perhaps stimulated by a desire for direct processing - may produce new methods 
which can do a stepwise direct transaction type of calculation. On the other hand 
this does not appear to be generally possible. (It is however already today a 
common situation in procedures for forecasting.) 




Chapter 12. File organization 



















Record lay-outs. 

It was shown in chapter 5 how information requirements were traced backwards 
in the precedence graph until it was defined in terms of what we shall now name 
elementary files. 

^ eiemeaitaiy file will consist of data from a minimum of two information classes 
so that each record in an elementary file (each elementary record) contains a mi- 
nimum of two tgrm^ One of these is the sorting key, the other the value of some 
ftmction of this key. Thus an elementary file might be defined by the pair: iden- 
ti 1 cation number, person name. Another might be: identification number, salary, 
to many cases however it is not appropriate to use such small elementary files 
Then the elementary file will contain the key and two or more information classes 
in a package that is hi no case partitioned further. As an example we may find 
hat in order to obtain a unique definition of all person identification numbers of 
the respective persons, the elementary record would thus, in this case, at least 
contain the three terms identification number, name, address. 

When elementary files are consolidated then the elementary records are assemb¬ 
led to.fi lj ^ records , and these file records obtain a certain structure by the way 
toe consolidation is done. It is in this way that the relatively complicated records 
of business data files come into being. It is seen that one might expect that the 
record formats are defined already at this stage of the analysis. Thus we have 
another kind of work that should be done well in advance of the actual, detailed 
programming, contrary to present usage. 

Let us take the consolidated file (i, g, h, f) = F 1 (say). We find 

i goes into pr(b, i') 
g goes into pr(a, e") 
h goes into pr(a, e') 
f goes into pr(a, e") 

It will be convenient to be able to move that part of the record F, (i g h ft 
which is to be copied only during a process. If we look at .fig. 29.2.2 we see that in 
process (b, i' ) i', g is to be moved from F, to the output file (i' g) in the 
cases when a record from Fl is not changed. Thus we find that the organization 
of the records of f x should be as shown in fig. l. We have introduced the name 

, for h . a*\ anrl f _ /V. A «„ s __• - 


for (i , g) and f^ for (h, f) as is seen in fig, l. 
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We assume that the data sets i, g, h, f are consisting of elementary items in 
the following way: 

i = il, i2, i3; 
g * gl, g2; 
h = h-^, hg, hg, I 14 ; 
f = fj, fg* ^3 • 

where il = gl = hi = f 1 = the common sorting key which is also the key for 
the whole file F 1 and therefore will be denoted by F 10 . 

With this information we now have the record lay-out for F 1 as shown in fig. 2. 



Fig. 2 

This record organization can be described by the Cobol way of using level num¬ 
bers with 01 for the record name, 02 for the data items or data sets on the 
highest level within the record and so forth. 


01 FI 

02 F10 

02 Fll 

03 i 

04 i2 
04 i3 
03 g2 
02 F12 

03 h 

04 h2 
04 h3 
04 h4 
03 f 

04 f2 
04 f3 


We have, so far, seen how an organizational structure is imposed upon a record 
by the overall structure of the information system. This is not all that is needed 
to determine the record format however. Thus often the way records are stored 
on tape, for instance, affects data transport time and processing time in opposite 
ways. Thus the choice will depend on whether the file will be used in processing 
where transport time dominates, or processing dominates, or neither. The de¬ 
cision in this respect may have to wait until programming is being done, although 
for some files the situation already at the time of data structure analysis will 
be decisive. 

The length of tape blocks will also have to be decided at the same time, and for 
some firms also the coding form will be open to special decision. 
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2 , 


Record organization. 


Ideally the record of a file should be a function of the file only and not of the 
program. Therefore the record description should be made when the file orga¬ 
nization is decided upon. 


In any program using a file as input or output one should then, in the data descrip- 
tion part of the program, only need to name the file or the record if the file has 
several record types. 


Alternatively the file might itself carry with it its record descriptions. 


In actual practice the problem is a little bit more complicated for it may be 
desirable to organize the record differently for different programs. This may 
be the reason for having normally new record descriptions for each Cobol pro¬ 
gram, for instance. 

A better procedure would be to make use of the fact that the record is uniquely 
defined to some extent by the file and to some extent by the actual process. Thus 
one should have a basic record description which is defined by the file and an 
additional structural description which adds structure and "group names" to the 
basic record description. Only the latter then would have to be defined for the 
process. 

The basic record description would name all terms contained in the record, 
with type and size specification and a minimum of structure, for instance, as 
necessary by multiple occurrence of subrecords (arrays or "occurs declared 
groups"). 

The only additional structure requirements seems to be for the reasons of 
making simple term names unique by qualification by natural groups (or "ele¬ 
mentary files"), or for reasons of enabling simple moving of record terms not 
used in the actual process. 

There might also be the reason that operations upon groups were to be made in 
a process. It is for instance quite feasible to do a comparison of groups in a 
process. 

Of these only the two last mentioned structure definitions should vary from one 
process to another. Only these should then have to be defined with each program. 
It would even be possible to let the moving of unused data be done automatically 
on the call of a special move procedure, for instance "move unprocessed". 

The advantage of concentrating the program description regarding records 
only to describe additional structure is twofold. In the first place it would re¬ 
duce program work and reduce the need for programming when the file organiza¬ 
tion is being changed. In the second place, and most important, it would guaran¬ 
tee that the correct basic record description is being used whenever the file is 
used in a process. 
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In 1950 the author made an experiment to determine the frecpency of errors in 
key-punching numeric punched-cards. It was found that 0.5 %o of the digits 
were P in error. During a subsequent verifying operation about 90 % of these were 
detected leaving about 0.05 %o of erroneous digits. Unfortunately no other, 
and larger, statistics are available to the author. The publication on this subject 
does indeed need extension! 

Some tests to determine the reliability of punching punched-tape numerically 
were reported recently 1 *. About 1 %o of errors punched were found, that is 
the same order as mentioned above for cards. When a "check-dtgit" was us 
to control the numbers punched this detected about 99 % of the errors: thus 
this check is 10 times better than verifying, when feasible. This is no , 
however, for the check digit will also detect the errors in the written material 
from which punching was done. As this is not true for card verifying, and aa 
the error rate in written data probably is at least as large as in the punching 
operation, verifying actually takes only 50 % of all the errors. 

These figures show clearly that it is necessary to have a bett f r !"*? 

data than that provided by verifying. We have come to estimate the number of 
errors done in punching to 1 %o, and that in writing to likewise 1 %o, and that 
after verifying IstUl the 1 %o of writing errors remain. If we input on the average 
as little as 10 digits per second we would still input about 3 errors per hour. 



Chapter 14. 


~ Pr ° blem ° f ° pHmum of information -- 






_ T he problem of optimum gronplng of information 

The problem arises from the following assumptions. 

Several processes use information from the same tile. Therefore if 

*"*“ ° ne tten ‘ he ^ taforma «°“ w»W only have to te 

"Same tile" here means that also (he sorting order is the same. 

on memory si Z e will, generally, be a limit to the Lount of^g^e! 
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1 Let S be the set of elementary processes 1, 2, 3, ... The pair saving will 
be defined as the function v- over S x S = S 2 . 

2 We order S 2 in decreasing values of Vy, and then denote it by S 2 . We say that 

the pairs i, j and j, k (or k, j) are connected by j. Likewise way call v-. con¬ 
nected with Vjy (by j). 1 * 1 


be the set of pairs in S 2 which has the rmndmnrn value of 


S 


2 


vy (i and j uniquely occuring in ). 



5 A virtually more common simple special case is when different pairings cause 
the same reduction in transport volume but instead the memory requirement 
has to be considered. This for instance is the case when several transaction 
items are to be processed against the same master file. 

This problem is not too difficult to handle even if no procedure is available 
and even for more general grouping than just pairing. The reason is that we 
never have to handle a large number of different pairs. Thus if the number of 
elements is high then it is easy to find pairs of equal size for almost all ele¬ 
ments. Further if memory size permits many processes in each group it is 
easy to group into combinations that fill the memory fairly well - just because 
in this situation there must be many "small” processes (in the sense of using 
but little memory). 
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Thus when many "smaU" processes are associated with many transaction 
items for one file it will be natural as a first approximation to take them to¬ 
gether into much fewer groups which are still small enough to be feasible for 
further grouping by two or three, or four, in a final, more system oriented 
analysis. As an example we may consider the case^ where each of the elemen¬ 
tary files a, b, c, d, e could well consist of several smaller files. The natural 
objection here would be that the first grouping might have been done in a more 
even way so that the large difference between for instance vol(a) = 500 and 
vol(e) = 1500 would not have occurred. Even this however is realistic because 
there will often be some connections between some of the initial elementary 
transaction items which makes only some grouping feasible at the first stage. 


1) Example in 28.4. 
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part III 


SOME DATA PROCESSING PROBLEMS 
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1 . 


Relation between a process and its files. 


The transport volume associated with a certain file and a process is of course 
very significantly dependent on whether the process will call for a single scan 
of a file only or a multiple scan. This depends on how intricate is the connec¬ 
tion between the process and its files. As always this will be heavily dependent 
on the size of the main memory available to the process. More precisely the 
multiplicity of file scan will depend on how much of the file must be simultaneous¬ 
ly accessible to the process and how this is related to the memory space avail¬ 
able. This is a question that is intimately connected with the question of informa¬ 
tion precedence relations and does therefore belong to the information system 
analysis* rather than programming. The need for file sorting* in itself a mul¬ 
tiple scan process, is also caused by relations that exist between the different 
data and their processes. 
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2 . 


Some basic problems of file processing. 


An electronic computing system, or computer for short, consists typically of 
a processor (or perhaps a set of them) and a set of memory positions with fast 
access, which we shall call the memory. When a computer works with its 
program and data stored in the memory, it is utilizing its highest possible speed. 

When the data volumes and programs are too large to be stored entirely in the 
memory, then a back-up store or file store has to be used. We then talk about 
file processing. We have the problem here that this file store is slower than 
the memory. Hence the computer will lose time in transporting data between 
store and memory. The data and program then have to be organized in such a 
way as to minimize this loss of time. 

In the worst case, "the slow extreme", something like every fifth program step 
will ask for data in random order from out of a data set that is much larger 
than the memory space and with repeated access to most data. In this case 
the effective operation speed of the computer goes down from an order of magni¬ 
tude of the memory cycle time to that corresponding to about 1/5 of the average 
random access time in the store, which means, usually, a slow-down by a fac¬ 
tor of 100 to 100000. This is a problem lype that is especially bad for magnetic 
tape stores. This extreme, however, is almost impossible to occur in practice. 

On the other extreme, "the fast extreme", - which does instead often occur so 
often as to be typical for business data processing - each data item will only 
have to be moved once from store to memory (or from memory to store) where¬ 
upon it may be subject to several internal computer operations. 

Further the items will in this extreme almost always be taken from (or put to) 
a part of the store to which access time is much smaller than the average one. 

In this case the effective operation time of the computer retains its original 
order of magnitude. This type of problem is the ideal one for magnetic tape 
stores but will also give minimum time in other kinds of file store. 

In most problems of engineering or scientific data processing, as well as in an 
increasing number of administrative problems, the situation will be one that 
comes between the two extremes. All too often is it then hastily assumed to be 
a problem of the slow extreme, and supposed to be impossible for magnetic 
tapes. A careful organization of the processing and data will in most of these 
cases lead to reasonable process time and make tape systems as good as the 
other ones available. The new larger internal computer memories play an im¬ 
portant role in this connection. It follows from what has been said that in con¬ 
nection with file processing applications it is important to be able to estimate, 
already on the basis of some few characteristics of the problem, the extent to 
which file storage will reduce the efficient processing speed. 


1) It is not possible, in normal processors, to have each program step 

call for access to the backing store. This is because of the existence of 
"housekeeping" operations. 
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operators operate upon Sp(r). Then the kernel set Sm(r) can be regarded as the 
kernel (in the common algebraic sense) of Sw(r) in that it is the part of Sp(r) not 
wasted (by the operation Sw(r) ). 


Previous 



Fig. 1 


3 Let So be the memory space 


available. Then Sw(r) ^ So for all r. 


QQ-Q £7 


3. 


K-Progresaive process. 


T 866 i hat 0n ^ y lf at eac ^ stage the memory space made available 
for twTv? r 0 i0n ! s ! uffloient for to the increment set of a file 

this nrit^ ^ Stage ’ Can 016 fUe ^ llnear “ that process. In order to state 
this with more precision we define a concept which we shall call K-prosressii 

Excess. A process shaU be said to be K-progresslve if for all R^ daSl files 


R 

*-Z 

r--l 

(K a constant) 


Sw(r) 


R 

-Z 

r=l 


Sm(r) 


R 

- Z 

r=l 


F 

Z 

£=1 


Sf(r) > K 


° r S min <“> = K 

1%*™“ 18 K - progress i ve it follows from 1 that after the step r-1. for 
instance, free memory space > K is available for the next stage, that is for 
stage r. For all F files involved the space required is 

Sf(r). 

f = 1 


M MIC 


avauaoie 


«ie memory cover^K and also: the K-progressivep^cess to ^emo^ZvZed 
After each stage of a K-progressive process the memory covers K. 

A sufficient condition for a K-progressive process to be linear is 


F 

£=1 


s *( r ) K for each r, and K Z So 


So = memory available for input at start. 

for if the process is K-progressive then it follows that after any step R there 
has been made free a storage space which at least has a magnitude of K (The 
condition is thus not a necessary one and sharper estimates^ight ^obZed) 

bUSlne38 procesaiDg each ^ge corresponds to a complete 
process which means that then Sm(r) = 0 for all r. (More precisely Sm(r) = 

const = memory space for some accumulated control totals. This, however 
is tnvml and may be taken into the working area of the program itoZT 
also follows that at each stage u 

F 

2L. s *( r ) ~ Sw(r) 

so that we only have to choose K > Sw(r )max to make sure that we have a 
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^progressive file. 

If -Z Sf(r) > K a K-progressive file will not be linear. However, the smaller 
">■ Sf(r) - K = dK the smaller will be the access time to a backing store of 
msiudo-serial type. Further, the probability of an operation calling for data 
from the backing store will also be smaller when dK becomes smaller. 
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4. Rectangular file 


1 In some situations the process requires access to the file data at several dif¬ 
ferent occasions so that linear processing is not possible *) but still is such a 
special case that a far better situation is obtained than that encountered in the 
slow extreme. One common instance of this kind is what we call rectangular 
file processing. In this two files F 1 and F 2 are matched in the way tiiat one of 
them, corresponding to the column F 2 of fig. 1 is first run in a linear process 


. Fll 
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F13 

^ - 

i 

F14 

-- 
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F15 

- “7 - 

\. 
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F16 

-7- 

\ 

_. •, 


\ 


one repetition for | 


each row of Fj 


Fig. 1 


together with a part, F-q, of F^. Then F 2 is rewound (perhaps while replaced 
by another copy of F g to save time) after which F 2 is run together with F 19 and 
so forth until F 2 has run "linearly" with all parts of F p shown as lines in the 
rectangular array for F^ in fig, 1, Each linear process typically means that the 
records of F 2 are taken one at a time, in sequence, and the associated record 
of Fj| (for i - 1, 2, 3, ..... in sequence as described), if any, is retrieved. 
When the row number i (say) of Fj has been processed against F„ the result is 
put out onto Fg as one or more records. 1 

2 The rectangular process is seen to be linear with regard to the "rectangular 
file" Fj and be repetitive for the "column" F 2 . If r is the number of rows in F 
(with respect to the process in question) then the latent need for repetition of 
F 2 is r times. 


3 If memory space permits F 2 to be stored completely then the process 

F lf F 2 -> F 3 

can be run as a linear process. 

4 If F 2 cannot be held in memory one may find it possible to store instead seve¬ 
ral rows of F 1 simultaneously. Suppose a group of g rows of F 1 can be held in 
memory at the same time with space left for the computer program also. Then 
any time a record of F 9 is in memory it can retrieve associated records from 
all the g rows of F in the group. The result will be that each scan of Fj handles 
g rows of F lf rather than one. The number of scans of F 2 will then be reduced 
to r/g by this group access technique. 


x) and we do not have, simply a retrieval situation, cf section 5 below. 
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Notice that from a file processing point of view the rectangular file will have 
r/g rows rather than r rows, which is the ''logical” number of rows. 

5 If Vj is the volume of Fj (from a transport point-of-view) and v 2 and v 3 are those 
of F 2 and F 3 then we see that the transport volume for the rectangular process, 
using group access with group size g, will be 

t = v x + (r/g)v 2 + v 3 

6 If instead we would store a part f 2 of F 2 in memory during a set of runs, and 
F 2 would consist of z such parts, it may in some cases be possible to run the 
process in z stages whereby the output may also have to be updated in each run. 
We would thus have the transport volume 

t = zvj + v 2 + 2zv 3 ' 


v 3 '= the average size of the intermediate 
output which eventually becomes F 

7 One example of a rectangular file processing is the multiplication of a rectan¬ 
gular matrix by a column if performed in the traditional way. If F 2 is very 
large a non-traditional multiplication scheme, "the multiplicati on by column 
selection” , which calls for a linear process only followed by a sorting routine, 
will in general be better. 

8 Another example of a rectangular process is a completely random retrieval 
operation as we shall see below. It is interesting that such a completely 
random process can be done in such a relatively orderly fashion as by a rec¬ 
tangular process. 

9 A matrix by matrix multiplication of the traditional type will correspond to a 
doubly rectangular process. In this r denotes the number of rows divided 

by g if group access is being used. 


1) Section 5 
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5. 


Retrieval of file records for a process. 


We have mentioned 1 2 ) that when a process is memory-covered then it can be 
run as linear and will thus require only a single scan of the files involved. When 
the process is not memory-covered then the file information will.have to be 
transported to the memory a number of times depending on how often such data 
are requested by the process. In addition the access time to these data will de ¬ 
pend on whether the whole file or only a narrow band of it is active at each pha¬ 
se of the process. When file data from all over the file are requested in random 
at almost every operation we have the "slow extreme” case. 

In many situations the process will request every datum only once. Even if then 
these requests are in completely random sequence we still have a much better 
situation than the slow extreme where each datum will be requested several 
times. Here instead it is requested only once and as a rule several operations 
will be done between each such request. We shall call such a situation where 
file data are needed only once a retrieval of information. 3 4 ) Thus we shall say 
that when the process needs some file data it has the problem of how to retrieve 
these data. 

2 The problem of retrieving data is associated with 

a identifying the group of data required 
b localizing these data 

c fetching the data and bringing them into the memory of the computer. 

3 To identify the group of data is to specify which properties it is required to 
have. A group of data has properties defined by its structure (which terms are 
in the group" 1 ) ) and the values of its different terms. If data are stored in com¬ 
pact form the structure cannot be sensed from its stored representation. Hence 
only the value associated with a specified part of a record - or set of parts - can 
be used in practice to check the data properly. As a result the ’’properties" of 

a record are identified with the values of its terms (elementary items). In the 
special case when the record is stored in an ordered sequence of the values of 
one term (or a set of terms) this will be said to be the (logical) name of the 
record 3 ). The same will hold in any case when a record is stored in a cell which 
can be localized by a process using the term value as an input. Obviously when 
we want to retrieve a record specified by the value of its name we are in a best 
position. We shall say that we then have a retrieval by name. 4) 

4 It may of course also happen that we want to retrieve records which are speci¬ 
fied by the value of some of its other terms, that is by a property other than its 
name. One way of doing this might be to re-order the data file to make this 
property the new name. This is the reason for so often using sorting of file data 
as a prerequisite to (other) processing. 

5 If we do not re-order before retrieveing we are in the situation of having to do 
a retrieval by property . 3 ) 


1) section 2 

2) see Langefors: Information Retrieval in File Processing BIT, Hefte 1, and 
Hefte 2, Bind 1, (1960). 3 La, 1961-1. 

3) it is also often called the "key" 

4) 3 La 1961-1. 
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6 A third possibility is to use an auxiliary file - or to partition the file - such 
that the retrieval by property is done on a much smaller file and gives as its 
result a retrieved record or set of records from which the names of all re¬ 
cords having the desired property value is defined. This then enables the retrie¬ 
val for the whole file record to be done as a retrieval by name. This may be 

of advantage if the retrieval by property is slower for a larger file. This we 
have seen to be the case not only for serial memories but also for pseudo random 
access memories. A second argument for this to pay is when retrieval by name 
is much faster than retrieval by property. This is true especially for random 
access or pseudo random access. This is the more so as r-a memories are ge¬ 
nerally bad in connection with retrieval by property. -0 

7 Example. In a book giving surveys of computer application it was stated? 
'’Another firm requirement ... is a random access type of storage ... As an 
example, accounting may require 10000 or 20000 TV spots to be controlled 
alphabetically by cities. Media may require the same information by decreasing 
market value, .... We want to point out that in all of this re-use it is not a 
question of pulling out every third, fourth, or tenth item, but rather of using 
the material in a completely different sequence each time; complete dependency 
on tape stations would by a mistake. " Comments, please. 


1) [ 3 La 1961-1 ] 
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1 . 


.-jgflHggcg of word len gth on tape recording sp e^ri 
The numerical terms (elementary items) in common data files have a length tw 

ri ternw^ US6d ^ the aV6rage number of non used word bits 

m a term storage is w/2. We say that the word boundary allowance is w/2 due 
to the non-uniform lengths of the different terms. g ' 

seTthat C SSnl^fI be US6d ^ eliminate or reduce the boundary allowance. We 
see that complete omission of packing, i. e. complete carrying of boundarv 

allowance, would increase the storage space from an averages 23 to 23? w 
°rby w /2 ... T 

23 + w72— ‘ 100 

per cent (very roughly giving an allowance of 100 -2 per cent) This would 

M >L edUCtl0n ° f aVerage tepe len & rth for ^ storage of one term if there 
would be no inter-record gap allowance. 

If the inter-record gap has a length which corresponds to g words and the 
record length is r words, then we have a "recordgap 
words per data word or 100 g/r per cent of lengthXvaJTU ml is 
taken mto account we obtain the reduced allowance, 

_ w/2 

(23+W/2) (1+g/r) 


By this rate then tape reading and writing speed is increased by taking care 

fcrtr'T r length of packin S- To do this on the other tomd always calls 
means which require memory space and decrease internal processing s^eed. 

100w/2 


Table 1: 


(23+w/2) (1+g/r) 


max term length = 45 bits 


w = 

6 

24 

48 


g/r = 2 

4 

11 

17 


1 

6 

17 

25 


0.2 

10 

29 

43 




100w/2 



Table 2: 

(18+w/2) (1+gr) 

max term length = 38 bits 

w = 

6 

24 

48 


g/r = 2 

5 

13 

19 


1 

7 

20 

29 


0.2 

12 

33 

48 



to addition to this allowance we also have, in non-binary machines, a code 
— caused b >' U3to s coded forms for storage which take more s^. 
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Thus for instance in 6-bit word machines (so-called variable word length 
machines) it is common to have one decimal digit in each word only , when 
using numeric data. As one digit corresponds to 3,3 bits or less this corresponds 
to a code allowance of (6-3,3)/6 = 45 %. This has been indicated in diagram 1, 
Note that this allowance is equal to the word allowance even for as long words as 
48 bits. 



Diagram 1. max term length - 45 bits 
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As is seen the gain in tape speed to be achieved by packing data, that is using 
non-uniform field length (a field being the space allocated to store one term) 
is rather small for a word length of 24 bits or less (even if we have used a rough 
estimate only). This is especially so when short records are used (small 
values of r/g). 


For instance if r/g = 1 then the record gap allowance is 100 % f whereas if 
r/g = 5 the allowance is only 20 %, so that an increase of r/g from 1 to 5 would 
save 80 %. It is thus seen to be much more desirable to use memory space for 
reducing record gap allowance than to reduce word boundary allowance, that is 
to use term packing, if the words are not too long, for instance 2# bits or less. 
When term packing is built into the computer hardware there is of course no 
choice, and term packing will be used whether advantageous or not. When term 
packing is by programming (software) it will often be wise not to use term 
packing for all terms. Still, if some groups of terms happen to be very short, 
or when some subrecords occur a number of times, the pay off for memory 
space used for term packing these parts may be so high as to justify this 
packing, while the remaining terms of the record go unpacked. ' 
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